10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This important study provides a comprehensive multi-omics characterization of Leishmania donovani stage differentiation, offering insights into the molecular basis of parasite adaptation across host environments. The authors present convincing evidence that stage transitions are not driven by genomic variation but instead rely on coordinated post-transcriptional regulation, including mRNA turnover, translation, and protein degradation. Although experimental validation of these findings and conclusions remains to be completed, the integration of diverse, high-quality datasets establishes a robust resource that will be of broad utility to researchers investigating Leishmania biology and life-cycle progression.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe co-regulated gene modules underlying stage differentiation in Leishmania donovani through a system-level analysis of multiple molecular layers. Using amastigotes isolated from infected hamster spleens and corresponding culture-derived promastigotes, they analyzed genomic variation, transcript abundance, protein levels, phosphorylation states, and metabolite profiles. By combining these, the study identified potential regulatory mechanisms associated with parasite differentiation and generated hypotheses regarding how gene expression is coordinated across different levels.

      Strengths:

      A major strength of the study is the breadth of the dataset generated. The integration provides an unusually comprehensive view of molecular changes associated with Leishmania differentiation in vitro. Such multi-layer datasets involving bona fide vertebrate host stages remain relatively rare in parasitology and will likely become a valuable resource for the molecular parasitology community. In addition, the use of amastigotes isolated from infected hamsters rather than relying on axenic models provided a biologically relevant framework for the analyses.

      The revised manuscript improved several aspects of the original. The RNA-seq analysis is described with a clearer pipeline, and several claims regarding causal regulatory feedback associations have been appropriately toned down. Among the observations reported, the association between parasite differentiation and proteasome-mediated protein degradation is particularly remarkable. The combination of quantitative proteomics with pharmacological inhibition of the proteasome with lactacystin provides support for a role for protein turnover in developmental transitions and paves the way for future mechanistic studies.

      Weaknesses:

      Most regulatory interpretations remain largely inferential or indirect. The integration identifies correlations between different levels, but direct functional validation is limited/absent. Many of the descriptions should not be interpreted as validated. As highlighted by the authors in this revised version, the mechanistic studies will be part of future work and are beyond the scope of the current work. Of note, the attempt to confirm lactacystin-induced inhibition of proteasomal activity via anti-polyUb immunoblotting did not demonstrate the expected outcome of increase in overall poly-ubiquitylation.

      Comments on revised version:

      The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.

    3. Reviewer #2 (Public review):

      Pescher and colleagues present a revised manuscript detailing the multi-omic characterisation of Leishmania donovani amastigote to promastigote differentiation and integration of this data. The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses about the intersections of regulatory proteins that are associated with life-cycle progression. The differentiation step studied is from amastigote to promastigote using hamster-derived amastigotes which is a major strength. The use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy; the promastigote experiments are performed at a low passage number. Therefore, this is a strength or the work as it reduces the interference from the biological plasticity of Leishmania when it is cultured outside the host for prolonged periods. The multi-omics datasets presented are robust in their acquisition and analysis and will form an excellent resource for researchers studying the molecular events (particularly proteasomal protein degradation, and phosphorylation) during life-stage progression.

      General comments on the revisions:

      My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).

      There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:

      RNA-seq.<br /> The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.

      Phosphoproteomics.<br /> The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

    4. Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.<br /> In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.<br /> The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.<br /> The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

    5. Author response:

      General Statements

      We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous nonquantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions.

      Point-by-point description of the revisions

      Reviewer #1:

      Significance

      At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.  

      Evidence, reproducibility and clarity

      The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either ‘lost’ (filtered out) or ‘gained’ (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.

      Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      We thank the reviewer for this important comment. We replaced ‘regulon’ throughout the manuscript by ‘co-regulated, functional gene clusters’ (or similar).

      It is unclear whether the findings in Fig.3E are based on previous analysis of stagespecific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stageregulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms ‘ribosome biogenesis’, ‘rRNA processing’ and ‘RNA methylation’ shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      This control is now included in Fig S7 and we have added the corresponding description to the text.

      I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      (1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Rectified, thanks for pointing this out.

      (2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Rectified, thanks for pointing this out.

      (3) Fig.1D: "SNP frequency" is the preferred term in English.

      Corrected.

      (4) Fig.2A: not sure what "counts}1" mean.

      This figure has been replaced.

      (5) Ln 685: "Transcripts with FC < 2 and adjusted p-value > 0.01 are represented by black dots" > This sentence is inaccurate. The intended wording might be: "Transcripts with FC < 2 OR adjusted p-value > 0.01 are represented by black dots"

      We thank the reviewer and corrected accordingly.  

      (6) Ln 698: Same as ln 685 mentioned above.

      We thank the reviewer and corrected accordingly.

      (7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      This was corrected in the figure and the legends were updated accordingly.

      (8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      (9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      This figure was removed and the legend modified.

      (10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.

      Reviewer #2 Evidence, reproducibility and clarity:

      In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      Major comments:

      (1) As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.   

      (2) There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      (3) I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      (4) If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      See response to comment 1 above.

      (5) As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      This control has now been added to Fig S7.

      (6) It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      (7) All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      (8) Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      See response to comment 1 above.

      (9) Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      (10) It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      (11) Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC–MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR<1%), (ii) robust phosphosite localisation probabilities (localisation probability >0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.

      (12) For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      Are prior studies referenced appropriately?

      Yes

      Are the text and figures clear and accurate?

      The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0–100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach – see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term “phosphostoichiometry” is imprecise and not correct in this context.

      In response, we (i) replaced the term “phosphostoichiometry” throughout the manuscript with a more accurate description, such as “normalized phosphorylation level”, or “relative phosphorylation change normalized to protein abundance”, and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting pvalues thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      The references have been formatted.

      Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein–protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, coexpression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      We apologize and have translated the text in English.

      I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      We have the following statement to the legend: ‘Confidence values were derived as described in Supplementary Methods’.

      Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter’s potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d 0

      RNAseq data: https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f4c06-b4bd-bc10f2dc0b14

      Proteomic data:  http://www.ebi.ac.uk/pride

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      Significance

      Strengths:

      (1) The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      We thank the reviewer for this positive assessment of our work.

      (2) The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations:

      Potential lack of appropriate replication (see above).

      See response to comment 1.

      Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      See response to comment 2 above.

      The study applies well established techniques without any particular technical stepchange. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      We thank the reviewer for these positive comments.

      This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:

      (1) Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as ‘coexpression’.  

      (2) Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      We agree with the reviewer and have replaced ‘regulon’ with ‘co-regulated gene clusters’ (or similar).

      (3) LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value < 0.01), 1534 of which were exclusively detected in either ama or pro." If a protein is exclusively detected in one stage, then by definition it should not be detected in that number of replicates at both stages. This apparent contradiction should be clarified.

      We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: ‘Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value < 0.01), and 1534 that were exclusively detected in either ama or pro (Figure 3A left panel, Table 6).’ We also modified the legend of the Figure 3B. Concerning missing values that could be either missing not at random (MNAR) or missing completely at random (MCAR), rather than introducing potentially misleading imputed values, we chose to treat these missing values as genuine stage-specific differences (presence/absence): quantitative statistics are restricted to proteins with measurable LFQ in both stages, while proteins with consistent presence in one stage and non-detection in the other are reported as stage-restricted detections. We believe this strategy is transparent and minimizes modeling assumptions, while still highlighting robust stage-specific signals. Our approach is supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.  

      (4) L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      We thank the reviewer for this comment. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’ We also deleted the ‘infinity’ symbol from the Figure.

      Minor Comments:

      Methods

      L132: Typo: "A according" should be "according."

      The ‘A’ refers to RNase A. We added a comma for clarification (…RNase A, according to…)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: “The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics.” The description then goes into further detail.  

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      We thank the reviewer for this comment. Unlike the paper cited above (using longterm cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Corrected to ‘validated’

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      Results

      L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      We thank the reviewers for these suggestions and have reformulated into: ‘In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.’

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Thank you for the comment – we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      This has been corrected to ‘The discrepancies we observed in a sub-set of genes between….’.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      We added this information to the text (‘some of the most significantly enrichment terms included …’).

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      This statement was too speculative and has been removed. Instead, we added ‘Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner’.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a wellestablished regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      We deleted the term as suggested and reformulated to ‘….our results confirm the important role of protein degradation….’.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      Discussion

      L555: As noted in L494, reconsider using the word "unexpected."

      Removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Corrected to ‘some of the most significantly enriched GO terms’.

      Referee cross-commenting

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      (1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      (2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      (3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      (4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      (5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance):

      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.  

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity):

      Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      - The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.

      - RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and posttranscriptional regulation itself.

      - Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      We thank the reviewer for the time and implication dedicated to our manuscript.  

      Further details are organised by order of apparition in the text:

      Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      We corrected this sentence to ‘Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2’.

      L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNAseq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some overrepresented tandem gene arrays where all gene copies share the same location and GO term).

      L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      We thank the reviewer for this comment and have corrected the statement accordingly.

      Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stagespecific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’

      L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.  

      The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      We rephrased the conclusion to ‘In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.’ Please see the response to comment 9 regarding the unique proteins.

      L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      We agree with the reviewer and have toned this statement down by adding the statement ‘….or simply be a consequence of culture adaptation’.

      The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      A couple of typos:

      In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      L225 "...peptide match was disable." should be "...peptide match was disabled."

      Both corrected

      Reviewer #4 (Significance):

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. eLife Assessment

      In this important study, the authors demonstrate that generative AI techniques (restricted Boltzmann machine) can be used effectively to design and characterize mutational pathways of WW domains with different binding specificities. The computational studies are complemented by experimental validations, and the results provide solid evidence supporting the idea that sequence landscape holds significance in understanding protein evolution from a transition path perspective. The minor weakness of the study in the current form concerns limited success in designing variants with smoothly varying binding specificities. Nevertheless, the work will likely have a major impact on research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely non-functional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificity-switching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

    3. Reviewer #2 (Public review):

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 18-19. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBM-designed sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

    4. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely nonfunctional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificityswitching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

      This is a key question, which was one of the original motivations of our work. Both hypothesis of ‘abrupt switches’ (punctuated equilibria, corresponding to distinct specificities) and more gradual changes (smooth transition, through intermediate that exhibit mixed or intermediate specificity) are possible.

      Many natural specificity-switching events have probably resulted from the need to adapt to environmental change and selection for a different specificity, which can be compatible with an abrupt change in specificity. Others may reflect the gradual evolution of promiscuous ancestral sequences to more specialized ones, loosing cross-reactivity. A molecular mechanism that could allow abrupt switching is gene duplication, a frequent mechanism for WW domain diversification, beyond standard mutational-driven evolution processes.  

      As for the specificity-switching paths for WW domains found in this work, the presence of weakly responsive cross-reactive intermediates along the designed paths for I<->IV, and their absence in the I<->II path, suggests that designing promiscuous domains is hard (see also related response to point 3 of Reviewer 2) and generally not selected by natural evolution (as seen from the clear clustering of extant proteins in different specificity classes). 

      For a small domain such as WW, mutations that favor some specificity classes are known to have detrimental effects on fundamental properties, such as folding kinetics and stability, see Ref [72]. It is possible that larger, less constrained protein domains could allow for more crossreactive variants and smoother specifity switching. However, experiments on fluorescent proteins looking for interpolation between two wave-lengths have shown that the switch was abrupt [Poelwijk et al. Nature Communications (2019)].

      Our scope was to achieve a functional switch (imposed by the two extant end-points) through a path of designed, functional intermediates and to correctly predict, with our RBM model, the location of the specificity transition and of the cross-reactivity region (which we expected only along the I-IV path). This scope was successfully reached as demonstrated by experiments.  

      Reviewer #2:

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      We agree with Reviewer 2 that WW sequences are short and simple to handle from a computational point of view, and was chosen for this reason to test the design of full mutational paths (after having benchmarked it to lattice-protein models, see Refs. [30] and [44]). Our work gives additional support to the effectiveness of generative models learned from sequence data.  This said, from a biological point of view, WW is a highly constrained domain, see comment by Reviewer 1 above and our answer.

      In longer and more complex proteins, we expect it will be more difficult to disentangle specificityswitching latent units, see Fernandez-de-Cossio-Diaz et al., Physical Review X 2023 for a discussion and a possible computational approach to this issue. Notice that, while relating the latent units to specificity classes was convenient, it was not used to generate the paths themselves. Therefore, we believe that our method is quite robust and easily generalizable to applications to more complex and longer proteins. As an illustration, we have recently used it to sample viral trajectories (more precisely, variants of the Receptor Binding Domain of the SARSCoV-2 spike protein) capable of escaping antibody recognition, see Huot et al., PNAS 2026. In this recent work, we projected the paths onto the principal antigenic space, defined by the top two Principal Components of the viral variant binding affinities to 32 antibodies. In this representation, sampled paths displayed trends similar to natural paths, drawn from the sequences sampled during the pandemics. This finding supports the applicability and interpretation of our method for more complex proteins.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      We think that this finding, true for paths connecting classes I and IV, is not general. In a previous paper we have benchmarked our path-designing approach on simple models of insilico lattice proteins and shown that indirect path led to gains in the overall fitness (computed according with the ground-truth model) [Mauri, Cocco, Monasson, Physical Review E 2023, fig. 9-12].

      In general, we would expect that indirect paths could explore alternative mutations, important to compensate for transitory destabilizing mutations that could occur along the path. We speculate that these stabilizing mutations happen for non-direct paths at its extremity near class-I wildtype. A slightly decrease in binding response to peptide C1 for direct path is nevertheless observed (see Suppl Table 4), but our experimental detection, focused on binding response, is not tailored to directly detect a difference in stability. When approaching the class-IV anchoring point, we observe that paths interpolating between classes I and IV are very constrained and show limited diversity, going through a funnel in sequence space corresponding to the direct path. We agree with Reviewer 2 that a more exhaustive comparison with direct paths would be interesting, and will add a sentence in conclusion.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 1819. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      Class I to Class IV paths and Class I to Class II paths fundamentally differ because the binding pocket in Class I WW domains is different from the one of Class IV WWs, while Classes I and II/III share the same binding region. This important difference may explain why class I specificity can switch to class IV specificity (steps 20-21), without completely loosing affinity to the peptide of class I. To investigate if the two binding regions are really independent or not, we have tested some additional specific mutations along the I-IV mutational paths. In our attempts to engineer cross-reactivity, we have observed that it is important to substantially lower affinity to class I peptide to acquire class IV specificity, in agreement with previous studies [72]. Moreover, the I to IV path seems to go through a funnel-like part in the region with no natural sequences, with the same transition intermediates obtained in several designed paths. This indicates that the Class I to Class IV functional switch is more constrained than the Class I to II switch. Let us also emphasize that our assessment of class specificity is based on one peptide for each class. It would be interesting to test multiple WW-binding peptides with similar biochemical properties to acquire a more complete view of the specificities. 

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      Section 3.5 explains that RBM samples can be biased, by lowering the sampling temperature to 1/3 to obtain high-scores sequences, which are more likely to be functional as proven in [Russ et al., Science 2020]. We acknowledge (as also noted by Reviewer 1) that this section comes at the end of the manuscript, while differences in scores along the path are shown before, so the discussion of this important point is somewhat delayed. We will add a sentence earlier in Results to explain this point.  

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBMdesigned sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

      We agree with Reviewer 2 that the consensus sequence is an atypical sequence for an independent model with a large RBM score. We will update Figure 5 of the manuscript to show that this is also happening in our case. 

      We use Maximum Likelihood in ASR but our ASR path corresponds to all internal nodes of the reconstructed tree joining the two extant sequences, not only to the most ancestral node. Overall, the ancestral sequences along the ASR paths are different from the consensus sequence (mean identity of 76% and 60% respectively). The most ancestral nodes in the paths  are also different from the consensus having 81% (paths between type I and IV domains) or 54%(paths between type I and II/III domains) similarity, and an RBM score  of -21, or -58, respectively. We agree that some ASR internal-node sequence have a higher score than the natural wild-types (extant sequences). This is shown in Fig. 6: several points have larger RBM score than the two anchoring points at the extremities of the path, possibly due to the fact that natural sequences are not always the most stable ones. As discussed in conclusion, ASR nodes have moreover generally better scores than the sequences obtained by sampling an independent model. Phylogenetic reconstruction implicitly takes into account some degree of co-variation between sites in natural sequences, as shown by the success of the use of the phylogenetic distance of a mutated sequence to the wild-type for predicting the fitness effect of these mutations [Laine, Mol. Biol. Evol. 2019]. 

      To better show this effect we will update Figure 6, reporting also the scores of the « scrambled » sequences, which do not respect potential epistasis extracted by the RBM. It appears that ASR sequences generally have better scores than the scrambled sequences, and lower than RBM sequences (sampled at T=1/3). RBM models takes into account multiple-residues correlations, which could contribute to reaching better scores than ASR and BM models. Ongoing studies on larger proteins show that the score of sequences sampled from ASR reconstruction, including the Maximum Likelihood one, can still be improved according to the RBM score by a few mutations consistent with the ASR posterior probabilities (unpublished). 

      Mistakes in the reference list will be amended in the updated version.

    1. eLife Assessment

      This important study highlights the role of MIRO1 in regulating mitochondrial oxidative phosphorylation in smooth muscle cells, a process that appears necessary to sustain their proliferation. Overall, the work provides convincing evidence that mitochondrial positioning and function influence vascular disease, although several bioenergetic and mechanistic aspects would benefit from deeper investigation.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Comments on revisions:

      The authors have adequately addressed all the concerns raised by the reviewers, and the manuscript has been substantially improved

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Comments on revisions:

      The authors have addressed the concerns I previously raised.

    4. Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      - This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.<br /> - This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.<br /> - The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      - Some key bioenergetic aspects may require further investigation.

      Comments on revisions:

      The authors have adequately addressed most of the concerns I raised. I would suggest adding some of the justifications provided to the reviewers to the manuscript to further clarify and aid interpretation of the data, especially for the bioenergetic part (e.g., the proposed interaction with CI components, which might otherwise appear implausible to readers).

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

      We thank the reviewers for their thoughtful and constructive feedback. We appreciate their recognition of our work’s value and the improvements made in this revised version.

      We are particularly grateful to Reviewer 3 for their detailed and insightful comments, which identified errors we (and other reviewers) had unfortunately overlooked. To address these concerns and ensure the manuscript meets the high standards of clarity and rigor we aim for, we have made additional corrections and refinements.

      As part of this process, we conducted a thorough review of the original source files. This was especially important given that the project spanned from 2018 to 2025, and many co-authors have since left their previous positions.

      We appreciate the opportunity to resubmit this manuscript and are confident that these updates fully address the concerns raised by the reviewer and the editorial team.

      Reviewer #3 (Recommendations for the authors):

      (1) I still do not see the data in WB 2G reflecting the quantification in 2H and 2I. Moreover, the authors state they performed 1 additional experiment, but it appears not to have been included in the analysis of 2H and 2I since the graphs remained the same from the last version of the manuscript.

      We apologize for this oversight. The additional experiment has now been incorporated into the analysis for Figures 2H and 2I, and the graphs have been updated accordingly. While we had uploaded the new blot, we inadvertently forgot to update the analysis graphs. Thank you for bringing this to our attention.

      (2) The authors talk several times about "supercomplexes 1 and 2" without testing their precise composition (there is a ton of literature about SC species in several mouse cell types, and separate BN-PAGE immunoblotting of individual MRC complexes would precisely define them in this context)

      We agree with the reviewer that this is an important point. However, structural differences between supercomplexes were outside the scope of this paper, and we did not perform such analyses. That said, examining the precise composition of supercomplexes could be a valuable direction for future work.

      (3) Steady-state levels of MRC subunits do not match the observations from BN-PAGE results. That might be potentially interpreted and explained by the possible accumulation of intermediates but this is not explored.

      We appreciate the reviewer’s observation. There is indeed a strong possibility that differences in the expression of structural components of mitochondrial complexes exist between WT and Miro1 -/- cells. However, in this study, we chose to focus on assessing potential differences in the enzymatic activities of the complexes rather than examining their structural composition. Exploring the accumulation of intermediates and structural differences could be an interesting avenue for future investigations.

      (4) Citrate synthase normalization of kinetic enzyme activities is claimed, yet it is not shown in any graph and no description of the method is provided.

      We sincerely thank the reviewer for pointing out this discrepancy. Upon careful review, we realized that our statement regarding citrate synthase normalization of kinetic enzyme activities in the last revised version was made in error. This was a miscommunication between co-authors, and we did not perform citrate synthase normalization. Instead, the normalization was performed against protein concentration, determined by the BCA assay as described in the manuscript. We regret this oversight and appreciate the opportunity to clarify this.

      (5) Complex I activity is still wrongfully described as NADPH oxidation in the methods

      We corrected this error.

      (6) The authors state 'Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV'. I do not understand this, I find this justification insufficient and not substantiated by any experimental evidence. What buffer has been used for isolation? There are hundreds of protocols for isolation of intact mitochondria and MRC complexes. Also, DDM and digitonin are the gold-standard detergents for MRC complexes isolation and separation via BN-PAGE.

      We thank the reviewer for raising this important point. We have revised the response to clarify the exact experimental conditions and to provide supporting data.

      For BN-PAGE, mitochondrial fractions purified from cultured VSMCs or aortic tissue were prepared using a standard protocol (now explicitly detailed in the Methods). Briefly, mitochondria were resuspended in 6-aminocaproic acid (ACA) buffer containing 750 mM ACA, 50 mM Bis-Tris (pH 7.0), and protease inhibitors. Forty micrograms of mitochondrial protein were solubilized with 1.5% digitonin, using a final detergent-to-protein ratio of 8:1, and incubated on ice for 20 minutes prior to clarification by centrifugation at 16,000 g for 30 minutes at 4°C. Thus, consistent with established standards, digitonin—one of the gold-standard detergents for MRC complex solubilization and BN-PAGE—was used throughout.

      Despite using these widely accepted conditions, we found that detection of fully assembled Complex IV by BN-PAGE was inconsistent, a limitation that has been reported by others and is known to be sensitive to mitochondrial source, tissue type, and solubilization efficiency. To address this directly and avoid over-interpretation, we assessed Complex IV integrity by examining core subunits. As shown in Figure 6—figure supplement 1 (panels B and C), expression levels of MTCO1 and MTCO2, both essential core components of Complex IV, do not differ significantly between WT and Miro1-/- cells, supporting the conclusion that Complex IV abundance is not altered.

      We have revised the manuscript to clarify these methodological details and to explicitly state that conclusions regarding Complex IV are based on subunit analysis rather than BN-PAGE visualization alone.

      (7) Complex V IGA also does not seem to reflect its quantification.

      Thank you for highlighting this concern. To address it, we will include the numerical data alongside the figures to ensure clarity and alignment with our findings. We hope this will provide a more comprehensive understanding and resolve any ambiguity.

      (8) Figure 6 supplement 1, the authors state 'we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants'. I do not understand, what background is being used? what mutants are being expressed? all the figures refer to Miro1 -/- which is, according to standard genetic nomenclature, a loss-of-function allele (KO).

      Thank you for your comment. To clarify, we first infected MIRO1fl/fl VSMCs with an adenovirus expressing the DNA recombinase Cre or a control adenovirus. Cells infected with the adenovirus expressing Cre are labeled as MIRO1-/- cells. In these MIRO1-/- cells, we then introduced MIRO1 wild type (WT) and MIRO1 mutants via adenoviral expression.

      The mutants include one lacking the transmembrane domain (MIRO1-ΔTM), and another in which the two EF hands of MIRO1 were point-mutated (MIRO1-KK). MIRO1-WT is denoted as Ad WT, the mutant MIRO1-KK as Ad KK, and MIRO1-ΔTM as Ad ΔTM in the figures. We hope this explanation clarifies the experimental background and nomenclature used.

      (9) Figure 6 supplement 1B, no normalization is provided (e.g. VDAC, TOM20 etc.). Interestingly, VDAC is then used to normalize the data in C-D-E-F-G. Also, why is MIRO1 detected in lane 4? Is the mutant stable or not? There is zero signal in A.

      Thank you very much for pointing out that the immunoblot for VDAC1 was missing in Figure 6—Supplement 1B. This figure has been reviewed several times, and unfortunately, this error was not detected. We sincerely apologize for this oversight. We have now revised the figure to include the immunoblot for VDAC1 to address this issue.

      Regarding the detection of MIRO1 in lane 4, we confirm that the "mutant" is not stable. To generate MIRO1 knockout cells, aortic smooth muscle cells from MIRO1fl/fl mice were isolated and cultured, followed by infection with an adenovirus expressing Cre. As these are primary cells and the deletion was induced by Cre expression, the recombination efficiency can vary, which is reflected in the variability observed in lanes 2 and 4 of the immunoblot.

      (10) Why are COX4 levels so low in the 2nd replicate in 7A? the authors 'We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (see image below)'. I could not find the image.

      Thank you for your comment. The second pair of samples in Figure 7A is from a different preparation of mitochondria. In our experimental design, a control sample and a MIRO1 knockdown sample were processed side by side and run next to each other on the immunoblot.

      Regarding the anti-VDAC immunoblot, the image was included in our response to reviewers during the previous revision, as we did not believe it altered the message conveyed by the COX4 blot. However, to ensure clarity and address your concern, we have now included the anti-VDAC immunoblot directly in the figure. We hope this addition resolves any ambiguity and provides further confidence in the data presented.

      (11) The proposed interaction between MIRO1 and NDUFA9 is very difficult to reconcile, as the two proteins reside in distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane (OMM), with its functional domains facing the cytosol, whereas NDUFA9 is a matrix-facing accessory subunit of mitochondrial Complex I, positioned at the interface between the N- and Q-modules.

      We appreciate the reviewer’s comment and agree that MIRO1 and NDUFA9 occupy distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane with cytosol-facing domains, whereas NDUFA9 is a matrix-facing accessory subunit of Complex I at the N/Q-module interface.

      Our data do not suggest a stable, constitutive interaction within intact mitochondria. Rather, the observed association likely reflects an indirect, transient, or context-dependent interaction, potentially occurring during mitochondrial stress, remodeling, or turnover. Such associations may be mediated by multi-protein complexes spanning mitochondrial membranes, dynamic contact sites, or post-lysis interactions detected under experimental conditions. Increasing evidence supports functional coupling between outer mitochondrial membrane proteins and inner membrane or matrix pathways without direct physical binding.

      Additional comments:

      (12) All the raw data should be provided to the readers (uncropped and annotated WB, IHC images, numerical data with statistics applied).

      We agree with the reviewer and appreciate the emphasis on transparency. In accordance with eLife submission requirements, we have provided all raw data. The Source Data files associated with each figure now include uncropped and annotated immunoblots, as well as the numerical source data for all quantified analyses.

      During the compilation of these materials, we were unable to locate the original source files for Figure 2A. The control experiment depicted in the previous version, which demonstrates in vitro recombination, was performed in 2018. However, this experiment was repeated several times throughout the project. Therefore, to ensure the manuscript remains complete, we have replaced this panel with a representative immunoblot from a similar experiment. Additionally, during our review, we discovered a labeling error in Figure 3D and G. We have corrected these figures to ensure accuracy.

      All source files have been provided and carefully labeled to facilitate independent evaluation.

    1. eLife Assessment

      This study provides valuable insights into how HIV-1 Env modulates the nanoscale organization and dynamics of the CXCR4 co-receptor on T cells, using quantitative imaging and functional approaches, the authors present convincing evidence that gp120 engagement promotes CD4-dependent clustering and altered mobility of CXCR4, distinct from the effects of the natural ligand CXCL12. Some concerns were raised regarding the interpretation of the single-particle tracking analyses, and additional clarification or analysis may help strengthen the conclusions. The physiological relevance of the findings could be further enhanced by validation with infectious virus and by more clearly integrating the CXCR4R334X mutant observations into the central mechanistic narrative. The work will be of interest to researchers studying HIV entry and membrane receptor organization.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      This article provides new insights into the organisational changes of the X4-tropic HIV-1 co-receptor CXCR4 upon binding of the viral receptor-binding protein X4-gp120, either in its soluble form or when displayed as Env on virus-like particles (VLPs). The study employs single-particle tracking total internal reflection fluorescence (SPT-TIRF) microscopy to quantify the dynamics and clustering of CXCR4 on CD4+ T cells. The data show that CXCR4 clusters in the presence of X4-gp120 and VLPs, a phenomenon that is also observed for the primary HIV-1 receptor CD4. The authors also show that a WHIM mutant of CXCR4 (CXCR4-R334X) that does not cluster in the presence of its natural ligand, CXCL12, clusters in the presence of X4-gp120 and VLPs.

      Major strengths:

      The data are well presented, discussed, and supported by solid evidence. Literature is cited appropriately.

      Major weaknesses:

      The authors have addressed my concerns in the revised manuscript.

      Significance:

      In summary, the work is presented in a clear fashion, and the main findings are properly highlighted. The paper will be of interest to the broader virology community as well as to researchers studying cell receptor clustering. The findings are not entirely surprising because it has been shown previously that the binding of Env to CD4 mediates CD4 clustering, which would also suggest clustering of the co-receptor. Nonetheless, the paper provides strong evidence that CXCR4 clusters and changes its dynamics in the presence of CD4 and X4-gp120. Moreover, the evidence that X4-gp120 clusters CXCR4-R334X is of high interest as it suggests a different binding mechanism for X4-gp120 from that of the natural ligand CXCL12, raising questions for further research.

    3. Reviewer #2 (Public review):

      Summary:

      The author investigates how the HIV-1 Env glycoprotein modulates the nanoscale organisation and dynamics of the CXCR4 co-receptor on CD4⁺ T cells. The author demonstrates that HIV-1 Env induces CXCR4 clustering distinct from that triggered by its natural ligand (CXCL12), implicating spatial receptor organization as a determinant of infection. This study investigates how HIV-1 Env (specifically X4-tropic gp120) alters the membrane organization and dynamics of the chemokine receptor CXCR4 and its WHIM-associated mutant, CXCR4R334X, in a CD4-dependent manner. Using single-particle tracking total internal reflection fluorescence microscopy (SPT-TIRF-M), the authors demonstrate that both soluble gp120 and virus-like particles (VLPs) displaying gp120 induce CXCR4 nanoclustering, reduce receptor diffusivity, and promote immobile nanoclusters of CXCR4 at the membrane of Jurkat T cells and primary CD4⁺ T cell blasts. The work offers new insights into the spatial organisation of receptors during HIV-1 entry and infection. The manuscript is well-written, and the findings are significant.

      Significance:

      Nature and significance of the advance:<br /> This work marks a conceptual and mechanistic breakthrough in understanding HIV-1 entry. It goes beyond the static view of Env-co-receptor interaction to show that nanoscale reorganization of CXCR4, distinct from chemokine-induced clustering, occurs during HIV-1 Env engagement and may be essential for infection.

      Context within existing literature. Previous studies established Env-induced CD4 clustering (Yin et al., 2020) and chemokine-induced CXCR4 nanocluster formation (Martínez-Muñoz et al., 2018), but the exact nanoscale rearrangement of CXCR4 in the context of HIV-1 Env and physiological Env densities remains unquantified. This study addresses this gap using SPT-TIRF, STED microscopy, and functional assays.

      Audience and influence: The findings will be of interest to researchers in HIV virology, membrane receptor biology, viral entry mechanisms, and therapeutic target development. The receptor-clustering aspect could also influence broader fields of study, such as GPCR organization and immune receptor signalling.

      Reviewer expertise: I can evaluate HIV-1 entry mechanisms, viral glycoprotein-host-host-host receptor interactions, single-molecule fluorescence microscopy, and membrane protein dynamics. I am less equipped to evaluate the deep structural modelling aspects, though the in silico AlphaFold results are straightforward to interpret in context.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigate how HIV-1 Env engagement affects the nanoscale organization and dynamics of the CXCR4 coreceptor on target cells. Using single-particle tracking TIRF microscopy, they analyze CXCR4 distribution following exposure to gp120 or HIV virus-like particles, including both wild-type CXCR4 and the WHIM-associated CXCR4.R334X variant. The study further examines the role of CD4-CXCR4 heterodimerization and contrasts Env-induced receptor organization with that elicited by the natural ligand CXCL12.

      Evaluation:

      A major strength of this work is the integration of high-resolution imaging with functional and comparative analyses that distinguish Env-induced CXCR4 clustering from chemokine-driven effects. The experiments are clearly described, include appropriate controls, and are supported by quantitative analyses that are consistent across experiments. The revised manuscript appears to have addressed many of the technical and interpretive issues raised during initial review, improving clarity around data analysis and strengthening confidence in the conclusions.

      I am not an expert in TIRF microscopy or single-molecule tracking and defer to other reviewers regarding limits of imaging and tracking methods. However, I did not identify major inconsistencies between the biological data presented and the conclusions drawn.

      The authors data support the conclusion that HIV-1 Env, delivered as gp120 or virus-like particles, promotes CD4-dependent nanoscale clustering of CXCR4, including the CXCR4.R334X variant associated with WHIM syndrome, in a manner distinct from CXCL12-induced receptor organization. The authors are generally careful to frame their conclusions in proportion to the evidence and avoid overinterpretation.

      Overall, this study builds on prior work on CXCR4 distribution and HIV entry by providing higher-resolution insight into receptor nanoclustering and its modulation by Env. The findings provide a mechanistic refinement rather than a conceptual paradigm shift but is a valuable dataset useful to researchers studying HIV entry, coreceptor biology, and membrane receptor organization.

      Reviewer expertise: HIV-1 Envelope glycoproteins and entry assays, HIV broadly neutralizing antibodies, HIV vaccine design

      Comments on revised version:

      This reviewer has no further recommendations and thanks the authors for clarifying that the Env content in gp120-VLPs was lower than the NL4-3deltaIN particles but that the percentage of mature particles in the gp120-VLPs was higher.

    5. Reviewer #4 (Public review):

      Summary:

      The authors investigate the impact of surface bound HIV gp120 and VLPs on CXCR4 dynamics in Jurkat T cells expressing WT or WHIM syndrome mutated CXCR4, which has a defective response to CXCL12. Jurkat cells were transfected with CXCR4-AcGFP. Images were acquired and a single particle tracking routine was applied to generate information about nanoclustering and diffusion, and FRET was used to investigate CD4-CXCR4 proximity. They compare effects of soluble gp120 to immature and mature VLPs, which include varying degrees of gp120 clustering. They find that solid phase gp120 or VLP can increase CXCR4 clustering size and decrease diffusion in Jurkat cells. Surprisingly, VLP lacking gp120 could increase CXCR4 clustering and speed, which is paradoxical as there were no known ligands on the VLPs, but they likely carry many cellular proteins with potential interactions. The impact of CXCL12 and gp120 binding to CXCR4 was different in terms of clustering and receptor down-regulation.

      Significance:

      The strengths are that it's an important question and the reagents are well prepared and characterised. They are detecting quantitative effects that will likely be reproducible. The information generated is potentially useful for those studying HIV infection processes and strategies to prevent infection.

      The major weakness is that the conditions for the SPT experiments are not ideal in that the density of particles is too high for SPT and the single molecule basis for assessing nanoclusters is not clear. This means that the data is getting at complex molecules phenomena and less likely be generating pure single molecules measurements.

      Comments on revised version:

      The authors should make the tracking data available and this will aid others in following up on it.

    6. Author response:

      Point-by-point description of the revisions

      Reviewer #1:

      Thank you very much for considering that our manuscript evaluates an important question and that the reagents used are well prepared and characterized. We also much appreciate that you consider the information generated as potentially useful for those studying HIV infection processes and strategies to prevent infection.

      (1) While a single particle tracking routine was applied to the data, it's not clear how the signal from a single GFP was defined and if movement during the 100 ms acquisition time impacts this. My concern would be that the routine is tracking fluctuations, and these are related to single particle dynamics, it appears from the movies that the density or the GFP tagged receptors in the cells is too high to allow clear tracking of single molecules. SPT with GFP is very difficult due to bleaching and relatively low quantum yield. Current efforts in this direction that are more successful include using SNAP tags with very photostable organic fluorophores. The data likely does mean something is happening with the receptor, but they need to be more conservative about the interpretation.

      Some of the paradoxical effects might be better understood through deeper analysis of the SPT data, particularly investigation of active transport and more detailed analysis of "immobile" objects. Comments on early figures illustrate how this could be approached. This would require selecting acquisitions where the GFP density is low enough for SPT and performing a more detailed analysis, but this may be difficult to do with GFP.

      When the authors discuss clusters of <2 or >3, how do they calibrate the value of GFP and the impact of diffusion on the measurement. One way to approach this might be single molecules measurements of dilute samples on glass vs in a supported lipid bilayer to map the streams of true immobility to diffusion at >1 µm2/sec.

      We fully understand the reviewer’s apprehensions regarding the application of these high-end biophysical techniques, in particular the associated complexity of the data analysis. We provide below extensive explanations on our methodology, which we hope will satisfactorily address all of the reviewer’s concerns.

      We would first like to emphasize that the experimental conditions and the quantitative analysis used in our current experiments are similar to the established protocols and methodologies applied by our group previously (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022; Gardeta et al. Frontiers in Immunol., 2022; García-Cuesta et al. eLife, 2024; Gardeta et al. Cell. Commun. Signal., 2025) and by others (Calebiro et al. PNAS, 2013; Jaqaman et al. Cell, 2011; Mattila et al. Immunity, 2013; Torreno-Pina et al. PNAS, 2014; Torreno-Pina et al. PNAS, 2016).

      As SPT (single-particle tracking) experiments require low-expressing conditions in order to follow individual trajectories (Manzo & García-Parajo Rep. Prog. Phys., 2015), we transiently transfected Jurkat CD4<sup>+</sup> cells with CXCR4-AcGFP or CXCR4<sup>R334X</sup>-AcGFP. At 24 h post-transfection, cells expressing low CXCR4-AcGFP levels were selected by a MoFlo Astrios Cell Sorter (BeckmanCoulter) to ensure optimal conditions for SPT. Using Dako Qifikit (DakoCytomation), we quantified the number of CXCR4 receptors and found ~8,500 – 22,000 CXCR4-AcGFP receptors/cell, which correspond to a particle density ~2 – 4.5 particles/µm<sup>2</sup> (Author response image 1) and are similar to the expression levels found in primary human lymphocytes.

      Author response image 1.

      Purified AcGFP monomeric protein was immobilized on glass at various concentrations. Dependency of the distribution of particle components on particle density was calculated; >95% were monomeric single particles at 2.0-4.5 particles/µm<sup>2</sup>. This range of particle density was used to analyze the dynamics of CXCR4-AcGFP, or CXCR4<sup>R334X</sup>-AcGFP single particles on JKCD4 cells.

      These cells were resuspended in RPMI supplemented with 2% FBS, NaPyr and L-glutamine and plated on 96-well plates for at least 2 h. Cells were centrifuged and resuspended in a buffer with HBSS, 25 mM HEPES, 2% FBS (pH 7.3) and plated on glass-bottomed microwell dishes (MatTek Corp.) coated with fibronectin (FN) (Sigma-Aldrich, 20 µg/ml, 1 h, 37°C). To observe the effect of the ligand, we coated dishes with FN + CXCL12; FN + X4-gp120 or FN + VLPs, as described in material and methods; cells were incubated (20 min, 37°C, 5% CO<sub>2</sub>) before image acquisition.

      For SPT measurements, we use a total internal reflection fluorescence (TIRF) microscope (Leica AM TIRF inverted) equipped with an EM-CCD camera (Andor DU 885-CS0-#10-VP), a 100x oilimmersion objective (HCX PL APO 100x/1.46 NA) and a 488-nm diode laser. The microscope was equipped with incubator and temperature control units; experiments were performed at 37°C with 5% CO<sub>2</sub>. To minimize photobleaching effects before image acquisition, cells were located and focused using the bright field, and a fine focus adjustment in TIRF mode was made at 5% laser power, an intensity insufficient for single-particle detection that ensures negligible photobleaching. Image sequences of individual particles (500 frames) were acquired at 49% laser power with a frame rate of 10 Hz (100 ms/frame). The penetration depth of the evanescent field used was 90 nm.

      We performed automatic tracking of individual particles using a very well established and common algorithm first described by Jaqaman (Jaqaman et al. Nat. Methods, 2008). Nevertheless, we would stress that we implemented this algorithm in a supervised fashion, i.e., we visually inspect each individual trajectory reconstruction in a separate window. Indeed, this algorithm is not able to quantify merging or splitting events.

      We follow each individual fluorescence spot frame-by-frame using a three-by-three matrix around the centroid position of the spot, as it diffuses on the cell membrane. To minimize the effect of photon fluctuations, we averaged the intensity over 20 frames. Nevertheless, to assure the reviewer that most of the single molecule traces last for at least 50 frames (i.e., 5 seconds), we provide the following data and arguments. We currently measure the photobleaching times from individual CD86-AcGFP spots exclusively having one single photobleaching step to guarantee that we are looking at individual CD86-AcGFP molecules. The distribution of the photobleaching times is shown below (Author response image 2). Fitting of the distribution to a single exponential decay renders a t0 value of ~5 s. Thus, with 20 frames averaging, we are essentially measuring the whole population of monomers in our experiments. As the survival time of a molecule before photobleaching will strongly depend on the excitation conditions, we used low excitation conditions (2 mW laser power, which corresponds to an excitation power density of ~0.015 kW/cm<sup>2</sup> considering the illumination region) and longer integration times (100 ms/frame) to increase the signal-to-background for single GFP detection while minimizing photobleaching.

      Author response image 2.

      Single molecule photobleaching times measured directly from single molecule trajectories of CD86-AcGFP, considering only traces that exhibit single molecule photobleaching steps. The experimental data are shown in gray bars (n=273 trajectories over 3 independent experiments). The red line corresponds to a single exponential decay fitting of the experimental data, from where t<sub>o</sub> has been extracted.

      To infer the stoichiometry of receptor complexes, we also perform single-step photobleaching analysis of the TIRF trajectories to establish the existence of different populations of monomers, dimers, trimers and nanoclusters and extract their percentage. Some representative trajectories of CXCR4-AcGFP with the number of steps detected are shown in new Supplementary Figure 1.  

      The emitted fluorescence (arbitrary units, a.u.) of each spot in the cells is quantified and normalized to the intensity emitted by monomeric CD86-AcGFP spots that strictly showed a single photobleaching step (Dorsch et al. Nat. Methods, 2009). We have preferred to use CD86-AcGFP in cells rather than AcGFP on glass to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. We have also previously shown pharmacological controls to exclude CXCL12-mediated receptor clustering due to internalization processes (Martinez-Muñoz et al. Mol. Cell, 2018) that, together with the evaluation of single photobleaching steps and intensity histograms, allow us to exclude the presence of vesicles in our data. Thus, the dimers, trimers and nanoclusters found in our data do correspond to CXCR4 molecules on the cell surface. Finally, distribution of monomeric particle intensities, obtained from the photobleaching analysis, was analyzed by Gaussian fitting, rendering a mean value of 980 ± 86 a.u. This value was then used as the monomer reference to estimate the number of receptors per particle in both cases, CXCR4-AcGFP and CXCR4<sup>R334X</sup>-AcGFP (new Supplementary Figure 1).

      (2) I understand that the CXCL12 or gp120 are attached to the substrate with fibronectin for adhesion. I'm less clear how how that VLPs are integrated. Were these added to cells already attached to FN?

      For TIRF-M experiments, cells were adhered to glass-bottomed microwell dishes coated with fibronectin, fibronectin + CXCL12, fibronectin + X4-gp120, or fibronectin + VLPs. As for CXCL12 and X4-gp120, the VLPs were attached to fibronectin taking advantage of electrostatic interactions. To clarify the integration of the VLPs in these assays, we have stained the microwell dishes coated with fibronectin and those coated with fibronectin + VLPs with wheat germ agglutinin (WGA) coupled to Alexa647 (Author response image 3) and evaluated the staining by confocal microscopy. These results indicate the presence of carbohydrates on the VLPs and are, therefore, indicative of the presence of VLPs on the fibronectin layer.

      Author response image 3.

      Representative confocal images of microwell dishes coated with fibronectin ((left panel) or fibronectin + VLPs (right panel)) and stained with wheat germ agglutinin (WGA) coupled to Alexa647. Bar scale 1µm.

      Moreover, it is important to remark that the effect of the VLPs on CXCR4 behavior at the cell surface observed by TIRF-M confirmed that the VLPs remained attached to the substrate during the experiment.

      (3) Fig 1A - The classification of particle tracks into mobile and immobile is overly simplistic description that goes back to bulk FRAP measurements and it not really applicable to single molecule tracking data, where it's rare to see anything that is immobile and alive. An alternative classification strategy uses sub-diffusion, normal diffusion and active diffusion (or active transport) to descriptions and particles can transition between these classes over the tracking period. Fig 1B- this data might be better displayed as histograms showing distributions within the different movement classes.

      In agreement with the reviewer’s commentary, the majority of the particles detected in our TIRFM experiments were indeed mobile. However, we also detected a variable, and biologically appreciable, percentage of immobile particles depending on the experimental condition analyzed (Figure 1A in the main manuscript). To establish a stringent threshold for identifying these immobile particles under our specific experimental conditions, we used purified monomeric AcGFP proteins immobilized on glass coverslips. Our analysis demonstrated that 95% of these immobilized proteins showed a diffusion coefficient £0.0015 µm<sup>2</sup>/s; consequently, this value was established as the cutoff to distinguish immobile from mobile trajectories. While the observation of truly immobile entities in a dynamic, living system is rare, the presence of these particles under our conditions is biologically significant. For instance, the detection of large, immobile receptor nanoclusters at the plasma membrane is entirely consistent with facilitating key cellular processes, such as enabling the robust signaling cascade triggered by ligand binding or promoting the crucial events required for efficient viral entry into the cells.

      Regarding the mobile receptors (defined as those with D<sub>1-4</sub> values exceeding 0.0015 µm<sup>2</sup>/s), we observed distinct diffusion profiles derived from mean square displacement (MSD) plots (Figure V) (Manzo & García-Parajo Rep. Prog. Phys., 2015), which were further classified based on motion, using the moment scaling spectrum (MSS) (Ewers et al. PNAS, 2005). Under all experimental conditions, the majority of mobile particles, ~85%, showed confined diffusion: for example under basal conditions, without ligand addition, ~90% of mobile particles showed confined diffusion, ~8.5% showed Brownian-free diffusion and ~1.5% exhibited directed motion (new Supplementary Figure 5A in the main manuscript). These data have been also included in the revised manuscript to show, in detail, the dynamic parameters of CXCR4.

      Due to the space constraints, it is very difficult to include all the figures generated. However, to ensure comprehensive assessment and transparency (for the purpose of this review), we have included below representative plots of the MSD values as a function of time from individual trajectories, showing different types of motion obtained in our experiments (Author response image 4).

      Author response image 4.

      Representative MSD plots from individual trajectories of CXCR4AcGFP detected by SPT-TIRF in resting JKCD4 cells showing different types of motion: A) confined, B) Brownian/Free, C) direct transport.

      (4) Fig 1C,D - It would be helpful to see a plot of D vs MSI at a single particle level. In comparing C and D I'm surprised there is not a larger difference between CXCL12 and X4-gp120. It would also be very important to see the behaviour of X4-gp120 on the CXCR4 deficient Jurkat that would provide a picture of CD4 diffusion. The CXCR4 nanoclustering related to the X4-gp120 could be dominated by CD4 behaviour.

      As previously described, all analyses were performed under SPT conditions (see previous response to point 1). Figure 1C details the percentage of oligomers (>3 receptors/particle) calibrated using Jurkat CD4<sup>+</sup> cells electroporated with monomeric CD86-AcGFP (Dorsch et al. Nat. Methods, 2009). The monomer value was determined by analyzing photobleaching steps as described in our previous response to point 1.

      In our experiments, we observed a trend towards a higher number of oligomers upon activation with CXCL12 compared with X4-gp120. This trend was further supported by measurements of Mean Spot Intensity. However, the values are also influenced by the number of larger spots, which represents a minor fraction of the total spots detected.

      The differences between the effect triggered by CXCL12 or X4-gp120 might also be attributed to a combination of factors related to differences in ligand concentration, their structure, and even to the technical requirements of TIRF-M. Both ligands are in contact with the substrate (fibronectin) and the specific nature of this interaction may differ between both ligands and influence their accessibility to CXCR4. Moreover, the requirement of the prior binding of gp120 to CD4 before CXCR4 engagement, in contrast to the direct binding of CXCL12 to CXCR4, might also contribute to the differences observed.

      We previously reported that CXCL12-mediated CXCR4 dynamics are modulated by CD4 coexpression (Martinez-Muñoz et al. Mol. Cell, 2018). We have now detected the formation of CD4 heterodimers with both CXCR4 and CXCR4<sup>R334X</sup>, and found that these conformations are influenced by gp120-VLPs. In the present manuscript, we did not focus on CD4 clustering as it has been extensively characterized previously (Barrero-Villar et al. J. Cell Sci., 2009; JiménezBaranda et al. Nat. Cell. Biol., 2007; Yuan et al. Viruses, 2021). Regarding the investigation of the effects of X4-gp120 on CXCR4-deficient Jurkat cells, which would provide a picture of CD4 diffusion, we would note that a previous report has already addressed this issue using single molecule super-resolution imaging, and revealed that CD4 molecules on the cell membrane are predominantly found as individual molecules or small clusters of up to 4 molecules, and that the size and number of these clusters increases upon virus binding or gp120 activation (Yuan et al. Viruses, 2021).

      (5) Fig S1D- This data is really interesting. However, if both the CD4 and the gp120 have his tags they need to be careful as poly-His tags can bind weakly to cells and increasing valency could generate some background. So, they should make the control is fair here. Ideally, using non-his tagged person of sCD4 and gp120 would be needed ideal or they need a His-tagged Fab binding to gp120 that doesn't induce CXCR4 binding.

      New Supplementary Figure 2D shows that X4-gp120 does not bind Daudi cells (these cells do not express CD4) in the absence of soluble CD4. While the reviewer is correct to state that both proteins contain a Histidine Tag, cell binding is only detected if X4-gp120 binds sCD4. Nonetheless, we have included in the revised Supplementary Figure 2D a control showing the negative binding of sCD4 to Daudi cells in the absence of X4-gp120. Altogether, these results confirm that only sCD4/X4-gp120 complexes bind these cells.

      (6) Fig S4- Panel D needs a scale bar. I can't figure out what I'm being shown without this.

      Apologies. A scale bar has been included in this panel (new Supplementary Figure 6D).

      Reviewer #2:

      (1) This study is well described in both the main text and figures. Introduction provides adequate background and cites the literature appropriately. Materials and Methods are detailed. Authors are careful in their interpretations, statistical comparisons, and include necessary controls in each experiment. The Discussion presents a reasonable interpretation of the results. Overall, there are no major weaknesses with this manuscript.

      We very much appreciate the positive comments of the reviewer regarding the broad interest and strength of our work.

      (2) NL4-3deltaIN and immature HIV virions are found to have less associated gp120 relative to wild-type particles. It is not obvious why this is the case for the deltaIN particles or genetically immature particles. Can the authors provide possible explanations? (A prior paper was cited, Chojnacki et al Science, 2012 but can the current authors provide their own interpretation.)

      Our conclusion from the data is actually exactly the opposite. As shown in Figure 2D, the gp120 staining intensity was higher for NL4-3DIN particles (1,786 a.u.) than for gp120-VLPs (1,223 a.u.), indicating lower expression of Env proteins in the latter. Furthermore, analysis of gp120 intensity per particle (Figure 2E) confirmed that gp120-VLPs contained fewer gp120 molecules per particle than NL4-3DIN virions. These levels were comparable with, or even lower than, those observed in primary HIV-1 viruses (Zhu et al. Nature, 2006). This reduction was a direct consequence of the method used to generate the VLPs, as our goal was to produce viral particles with minimal gp120 content to prevent artifacts in receptor clustering that might occur using high levels of Env proteins in the VLPs to activate the receptors.  

      This misunderstanding may arise from the fact that we also compared Gag condensation and Env distribution on the surface of gp120-VLPs with those observed in genetically immature particles and integrase-defective NL4-3ΔIN virions, which served as controls. STED microscopy data revealed differences in Env distribution between gp120-VLPs and NL4-3ΔIN virions, supporting the classification of gp120-VLPs as mature particles (Figure 2 A,B).

      Reviewer #3:

      We thank the reviewer for considering that our work offers new insights into the spatial organization of receptors during HIV-1 entry and infection and that the manuscript is well written, and the findings significant.

      (1) For mechanistic basis of gp120-CXCR4 versus CXCL12-CXCR4 differences. Provide additional structural or biochemical evidence to support the claim that gp120 stabilises a distinct CXCR4 conformation compared to CXCL12. If feasible, include molecular modelling, mutagenesis, or crosslinking experiments to corroborate the proposed conformational differences.

      We appreciate the opportunity to clarify this point. The specific claim that gp120 stabilizes a conformation of CXCR4 that is distinct from the CXCL12-bound state was not explicitly stated in our manuscript, although we agree that our data strongly support this possibility. It is important to consider that CXCL12 binds directly to CXCR4, whereas gp120 requires prior sequential binding to CD4, and its subsequent interaction is with a CXCR4 molecule that is already forming part of the CD4/CXCR4 complex, as demonstrated by our FRET experiments and supported by previous studies (Zaitseva et al. J. Leuk. Biol., 2005; Busillo & Benovic Biochim. Biophys. Acta, 2007; Martínez-Muñoz et al. PNAS, 2014). This difference makes it inherently complex to compare the conformational changes induced by gp120 and CXCL12 on CXCR4.

      However, our findings show that both stimuli induce oligomerization of CXCR4, a phenomenon not observed when mutant CXCR4<sup>R334X</sup> was exposed to the chemokine CXCL12 (García-Cuesta et al. PNAS, 2022).

      (1) CXCL12 induced oligomerization of CXCR4 but did not affect the dynamics of CXCR4<sup>R334X</sup> (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022). By contrast, X4-gp120 and the corresponding VLPs—which require initial binding to CD4 to engage the chemokine receptor—stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>.

      (2) FRET analysis revealed distinct FRET<sub>50</sub> values for CD4/CXCR4 (2.713) and CD4/CXCR4<sup>R334X</sup> (0.399) complexes, suggesting different conformations for each complex.

      (3) Consistent with previous reports (Balabanian et al. Blood, 2005; Zmajkovicova et al. Front. Immunol., 2024; García-Cuesta et al. PNAS, 2022), the molecular mechanisms activated by CXCL12 are distinct when comparing CXCR4 with CXCR4<sup>R334X</sup>. For instance, CXCL12 induces internalization of CXCR4, but not of mutant CXCR4<sup>R334X</sup>. Conversely, X4-gp120 triggers approximately 25% internalization of both receptors. Similarly, CXCL12 does not promote CD4 internalization in cells co-expressing CXCR4 or CXCR4<sup>R334X</sup>, whereas X4-gp120 does, although CD4 internalization was significantly higher in cells co-expressing CXCR4.

      These findings suggest that CD4 influences the conformation and the oligomerization state of both co-receptors. To further support this hypothesis, we have conducted new in silico molecular modeling of CD4 in complex with either CXCR4 or its mutant CXCR4<sup>R334X</sup> using AlphaFold 3.0 (Abramson et al. Nature, 2024). The server was provided with both sequences, and the interaction between the two molecules for each protein was requested. It produced a number of solutions, which were then analyzed using the software ChimeraX 1.10 (Meng et al. Protein Sci., 2023). CXCR4 and its mutant, CXCR4<sup>R334X</sup> bound to CD4, were superposed using one of the CD4 molecules from each complex, with the aim of comparing the spatial positioning of CD4 molecules when interacting with CXCR4.

      Author response image 5.

      CD4/CXCR4 complexes were superimposed with CD4/CXCR4 complexes (left panel) or CD4/CXCR4<sup>R334X</sup> complexes (right panels). Arrows indicate the CD4 molecule used as reference for the superimposing.

      As illustrated in Author response image 5, the superposition of the CD4/CXCR4 complexes was complete. However, when CD4/CXCR4 complexes were superimposed with CD4/CXCR4<sup>R334X</sup> complexes using the same CD4 molecule as a reference, indicated by an arrow in the figure, a clear structural deviation became evident. The main structural difference detected was the positioning of the CD4 transmembrane domains when interacting with either the wild-type or mutant CXCR4. While in complexes with CXCR4, the angle formed by the lines connecting residues E416 at the C-terminus end of CD4 with N196 in CXCR4 was 12°, for the CXCR4<sup>R334X</sup> complex, this angle increased to 24°, resulting in a distinct orientation of the CD4 extracellular domain (Author response image 6).

      Author response image 6.

      Comparison of the angle between the transmembrane domains of CD4 in CXCR4 WT and WHIM complexes. The angle between residues N196 from one CXCR4 molecule and E416 from the two CD4 dimer molecules was calculated for the CXCR4 WT (12°) and WHIM (24°) complexes to demonstrate the difference in CD4 positioning.

      To further analyze the models obtained, we employed PDBsum software (Laskowski & Thornton Protein Sci., 2021) to predict the CD4/CXCR4 interface residues. Data indicated that at least 50% of the interaction residues differed when the CD4/CXCR4 interaction surface was compared with that of the CD4/CXCR4<sup>R334X</sup> complex (Author response image 7). It is important to note that while some hydrogen bonds were present in both complex models, others were exclusive to one of them. For instance, whereas Cys<sup>394</sup>(CD4)-Tyr<sup>139</sup> and Lys<sup>299</sup>(CD4)-Glu<sup>272</sup> were present in both CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes, the pairs Asn<sup>337</sup>(CD4)-Ser<sup>27</sup>(CXCR4<sup>R334X</sup>) and Lys<sup>325</sup>(CD4)-Asp<sup>26</sup>(CXCR4<sup>R334X</sup>) were only found in CD4/CXCR4<sup>R334X</sup> complexes.

      Author response image 7.

      Interacting residues at the CD4/CXCR4 interface. The panel displays the interface residues from the CXCR4 and CD4 oligomer. CD4 residues labeled with a red sphere show the interacting residues present in both CXCR4-WT and –WHIM hetero- oligomers. The continuous red lines represent a saline bridge, while the blue lines indicate a hydrogen bond and the dashed red lines represent non-bonded interactions. As illustrated in the figure, half of the interacting residues differ between the WT and WHIM models, indicating that the interacting surfaces are also distinct.

      These findings, which are consistent with our FRET results, suggest distinct interaction surfaces between CD4 and the two chemokine receptors. Overall, these results are compatible with differences in the spatial conformation adopted by these complexes.

      (2) For Empty VLP effects on CXCR4 dynamics: Explore potential causes for the observed effects of Envdeficient VLPs. It's valuable to include additional controls such as particles from non-producer cells, lipid composition analysis, or blocking experiments to assess nonspecific interactions.

      As VLPs are complex entities, we thought that the relevant results should be obtained comparing the effects of Env(-) VLPs with gp120-VLPs. Therefore, we would first remark that regardless of the effect of Env(-) VLPs on CXCR4 dynamics, the most evident finding in this study is the strong effect of gp120-VLPs compared with control Env(-) VLPs. Nevertheless, regarding the effect of the Env(-) VLPs compared with medium, we propose several hypotheses. As several virions can be tethered to the cell surface via glycosaminoglycans (GAGs), we hypothesized that VLPs-GAGs interactions might indirectly influence the dynamics of CXCR4 and CXCR4<sup>R334X</sup> at the plasma membrane. Additionally, membrane fluidity is essential for receptor dynamics, therefore VLPs interactions with proteins, lipids or any other component of the cell membrane could also alter receptor behavior. It is well known that lipid rafts participate in the interaction of different viruses with target cells (Nayak & Hu Subcell. Biochem., 2004; Manes et al. Nat. Rev. Immunol., 2003; Rioethmullwer et al. Biochim. Biophys. Acta, 2006) and both the lipid composition and the presence of co-expressed proteins modulate ligand-mediated receptor oligomerization (Gardeta et al. Frontiers in Immunol., 2022; Gardeta et al. Cell. Commun. Signal., 2025). We have thus performed Raster Image Correlation Spectroscopy (RICS) analysis to assess membrane fluidity through membrane diffusion measurements on cells treated with Env(-) VLPs.

      Jurkat cells were labeled with Di-4-ANEPPDHG and seeded on FN and on FN + VLPs prior to analysis by RICS on confocal microscopy. The results indicated no significant differences in membrane diffusion under the treatment tested, thereby discarding an effect of VLPs on overall membrane fluidity (Author response image 8).

      Author response image 8.

      VLPs treatment does not alter cell membrane fluidity. Diffusion values obtained by RICS from JKCD4X4 cells. (n = 3, with at least 10 cells analyzed per experiment and condition; n.s., not significant).

      Nonetheless, these results do not rule out other non-specific interactions of Env(-) VLPs with membrane proteins that could affect receptor dynamics. For instance, it has been reported that Ctype lectin DC-SIGN acts as an efficient docking site for HIV-1 (Cambi et al. J. Cell. Biol., 2004; Wu & KewalRamani Nat. Rev. Immunol., 2006). However, a detailed investigation of these possible mechanisms is beyond the scope of this manuscript.

      (3) For Direct link between clustering and infection efficiency - Test whether disruption of CXCR4 clustering (e.g., using actin cytoskeleton inhibitors, membrane lipid perturbants, or clustering-deficient mutants) alters HIV-1 fusion or infection efficiency.

      Designing experiments using tools that disrupt receptor clustering by interacting with the receptors themselves is difficult and challenging, as these tools bind the receptor and can therefore alter parameters such as its conformation and/or its distribution at the cell membrane, as well as affect some cellular processes such as HIV-1 attachment and cell entry. Moreover, effects on actin polymerization or lipids dynamics can affect not only receptor clustering but also impact on other molecular mechanisms essential for efficient infection.

      Many previous reports have, nonetheless, indirectly correlated receptor clustering with cell infection efficiency. Cholesterol plays a key role in the entry of several viruses. Its depletion in primary cells and cell lines has been shown to confer strong resistance to HIV-1-mediated syncytium formation and infection by both CXCR4- and CCR5-tropic viruses (Liao et al. AIDS Res. Hum. Retroviruses, 2021). Moderate cholesterol depletion also reduces CXCL12-induced CXCR4 oligomerization and alters receptor dynamics (Gardeta et al. Cell. Commun. Signal., 2025). By restricting the lateral diffusion of CD4, sphingomyelinase treatment inhibits HIV-1 fusion (Finnegan et al. J. Virol., 2007). Depletion of sphingomyelins also disrupts CXCL12mediated CXCR4 oligomerization and its lateral diffusion (Gardeta et al. Front Immunol., 2022). Additional reports highlight the role of actin polymerization at the viral entry site, which facilitates clustering of HIV-1 receptors, a crucial step for membrane fusion (Serrano et al. Biol. Cell., 2023). Blockade of actin dynamics by Latrunculin A treatment, a drug that sequesters actin monomers and prevents its polymerization, blocks CXCL12-induced CXCR4 dynamics and oligomerization (Martínez-Muñoz et al. Mol. Cell, 2018).

      Altogether, these findings strongly support our hypothesis of a direct link between CXCR4 clustering and the efficiency of HIV-1 infection.

      (4) CD4/CXCR4 co-endocytosis hypothesis - Support the proposed model with direct evidence from livecell imaging or co-localization experiments during viral entry. Clarification is needed on whether internalization is simultaneous or sequential for CD4 and CXCR4.

      When referring to endocytosis of CD4 and CXCR4, we only hypothesized that HIV-1 might promote the internalization of both receptors either sequentially or simultaneously. The hypothesis was based in several findings:

      a) Previous studies have suggested that HIV-1 glycoproteins can reduce CD4 and CXCR4 levels during HIV-1 entry (Choi et al. Virol. J., 2008; Geleziunas et al. FASEB J, 1994; Hubert et al. Eur. J. Immunol., 1995).

      b) Receptor endocytosis has been proposed as a mechanism for HIV-1 entry (Daecke et al. J. Virol., 2005; Aggarwal et al. Traffick, 2017; Miyauchi et al. Cell, 2009; Carter et al. Virology, 2011).

      c) Our data from cells activated with X4-gp120 demonstrated internalization of CD4 and chemokine receptors, which correlated with HIV-1 infection in PBMCs from WHIM patients and healthy donors.

      d) CD4 and CXCR4 have been shown to co-localize in lipid rafts during HIV-1 infection (Manes et al. EMBO Rep., 2000; Popik et al. J. Virol., 2002)

      e) Our FRET data demonstrated that CD4 and CXCR4 form heterocomplexes and that FRET efficiency increased after gp120-VLPs treatment.

      We agree with the reviewer that further experiments are required to test this hypothesis, however, we believe that this is beyond the scope of the current manuscript.

      Minor Comments:

      (1) The conclusions rely solely on the HXB2 X4-tropic Env. It would strengthen the study to assess whether other X4 or dual-tropic strains induce similar receptor clustering and dynamics.

      The primary goal of our current study was to investigate the dynamics of the co-receptor CXCR4 during HIV-1 infection, motivated by previous reports showing CD4 oligomerization upon HIV1 binding and gp120 stimulation (Yuan et al. Viruses, 2021). We initially used a recombinant X4gp120, a soluble protein that does not fully replicate the functional properties of the native HIV-1 Env. Previous studies have shown that Env consists of gp120 trimers, which redistribute and cluster on the surface of virions following proteolytic Gag cleavage during maturation (Chojnacki et al. Nat. Commun., 2017). An important consideration in receptor oligomerization studies is the concentration of recombinant gp120 used, as it does not accurately reflect the low number of Env trimers present on native HIV-1 particles (Hart et al. J. Histochem. Cytochem., 1993; Zhu et al. Nature, 2006). To address these limitations, we generated virus-like particles (VLPs) containing low levels of X4-gp120 and repeated the dynamic analysis of CXCR4. The use of primary HIV-1 isolates was limited, in this project, to confirm that PBMCs from both healthy donors and WHIM patients were equally susceptible to infection. This result using a primary HIV-1 virus supports the conclusion drawn from our in vitro approaches. We thus believe that although the use of other X4- and dual-tropic strains may complement and reinforce the analysis, it is far beyond the scope of the current manuscript.

      (2) Given the observed clustering effects, it would be valuable to explore whether gp120-induced rearrangements alter epitope exposure to broadly neutralizing antibodies like 17b or 3BNC117. This would help connect the mechanistic insights to therapeutic relevance.

      As 3BNC117, VRC01 and b12 are broadly neutralizing mAbs that recognize conformational epitopes on gp120 (Li et al. J. Virol., 2011; Mata-Fink et al. J. Mol. Biol., 2013), they will struggle to bind the gp120/CD4/CXCR4 complex and therefore may not be ideal for detecting changes within the CD4/CXCR4 complex. The experiment suggested by the reviewer is thus challenging but also very complex. It would require evaluating antibody binding in two experimental conditions, in the absence and in the presence of oligomers. However, our data indicate that receptor oligomerization is promoted by X4-gp120 binding, and the selected antibodies are neutralizing mAbs, so they should block or hinder the binding of gp120 and, consequently, receptor oligomerization. An alternative approach would be to study the neutralizing capacity of these mAbs on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> complexes. Variations in their neutralizing activity could be then extrapolated to distinct gp120 conformations, which in turn may reflect differences between CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes.

      We thus assessed the ability of the VRC01 and b12, anti-gp120 mAbs, which were available in our laboratory, to neutralize gp120 binding on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup>. Specifically, increasing concentrations of each antibody were preincubated (60 min, 37ºC) with a fixed amount of X4-gp120 (0.05 µg/ml). The resulting complexes were then incubated with Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> (30 min, 37ºC) and, finally, their binding was analyzed by flow cytometry. Although we did not observe statistically significant differences in the neutralization capacity of b12 or VRC01 for the binding of X4-gp120 depending on the presence of CXCR4 or CXCR4<sup>334X</sup>, we observed a trend for greater concentrations of both mAbs to neutralize X4-gp120 binding in Jurkat CD4/CXCR4 cells than in Jurkat CD4/CXCR4<sup>R334X</sup> cells (Author response image 9).

      Author response image 9.

      Flow cytometry analysis of gp120 binding to Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> in the presence of different concentrations of the neutralizing anti-gp120 antibodies b12 (left panel) and VRC01 (right panel). AUC comparison by Welch’s t-test: pvalues 0.2950 and 0.2112 for b12 and VRC01 respectively (n = 2).

      These slight alterations in the neutralizing capacity of b12 and VRC01 mAbs may thus suggest minimal differences in the conformations of gp120 depending of the coreceptor used. We also detected that X4-gp120 and VLPs expressing gp120, which require initial binding to CD4 to engage the chemokine receptor, stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>, but FRET data indicated distinct FRET<sub>50</sub> values between the partners, (2.713) for CD4/CXCR4 and (0.399) for CD4/CXCR4<sup>R334X</sup> (Figure 5A,B in the main manuscript). Moreover, we also detected significantly more CD4 internalization mediated by X4-gp120 in cells co-expressing CD4 and CXCR4 than in those co-expressing CD4 and CXCR4<sup>R334X</sup> (Figure 6 in the main manuscript). Overall these latter data and those included in Author response images 5,6 and 7 indicate distinct conformations within each receptor complexes.

      (3) TIRF imaging limits analysis to the cell substrate interface. It would be useful to clarify whether CXCR4 receptor clustering occurs elsewhere, such as at immunological synapses or during cell-to-cell contact.

      In recent years, chemokine receptor oligomerization has gained significant research interest due to its role in modulating the ability of cells to sense chemoattractant gradients. This molecular organization is now recognized as a critical factor in governing directed cell migration (Martínez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022, Hauser et al. Immunity, 2016). In addition, advanced imaging techniques such as single-molecule and super-resolution microscopy have been used to investigate the spatial distribution and dynamic behaviour of CXCR4 within the immunological synapse in T cells (Felce et al. Front. Cell Dev. Biol., 2020). Building on these findings, we are currently conducting a project focused on characterizing CXCR4 clustering specifically within this specialized cellular region.

      (4) In LVP experiments, it would be useful to report transduction efficiency (% GFP+ cells) alongside MSI data to relate VLP infectivity with receptor clustering functionally.

      These experiments were designed to validate the functional integrity of the gp120 conformation on the LVPs, confirming their suitability for subsequent TIRF microscopy. Our objective was to establish a robust experimental tool rather than to perform a high-throughput quantification of transduction efficiency. It is for that reason that these experiments were included in new Supplementary Figure S6, which also contains the complete characterization of gp120-VLPs and LVPs. In such experimental conditions, quantifying the percentage of GFP-positive cells relative to the total number of cells plated in each well is very difficult. However, in line with the reviewer’s commentary and as we used the same number of cells in each experimental condition, we have included, in the revised manuscript, a complementary graph illustrating the GFP intensity (arbitrary units) detected in all the wells analyzed (new Supplementary Fig. 6E).

      (5) To ensure that differences in fusion events (Figure 7B) are attributable to target cell receptor properties, consider confirming that effector cells express similar levels of HIV-1 Env. Quantifying gp120 expression by flow cytometry or western blot would rule out the confounding effects of variable Env surface density.

      In these assays (Figure 7B), we used the same effector cells (cells expressing X4-gp120) in both experimental conditions, ensuring that any observed differences should be attributable solely to the target cells, either JKCD4X4 or JKCD4X4<sup>R334X</sup>. For this reason, in Figure 7A we included only the binding of X4-gp120 to the target cells which demonstrated similar levels of the receptors expressed by the cells.

      (6) HIV-mediated receptor downregulation may occur more slowly than ligand-induced internalization. Including a 24-hour time point would help assess whether gp120 induces delayed CD4 or CXCR4 loss beyond the early effects shown and to better capture potential delayed downregulation induced by gp120.

      The reviewer suggests using a 24-hour time point to facilitate detection of receptor internalization. However, such an extended incubation time may introduce some confounding factors, including receptor degradation, recycling and even de novo synthesis, which could affect the interpretation of the results. Under our experimental conditions, we observed that CXCL12 did not trigger CD4 internalization whereas X4-gp120 did. Interestingly, CD4 internalization depended on the coreceptor expressed by the cells.

      (7) Increase label font size in microscopy panels for improved readability.

      Of course; the font size of these panels has been increased in the revised version.

      (8) Consider adding more references on ligand-induced co-endocytosis of CD4 and chemokine receptors during HIV-1 entry.

      We have added more references to support this hypothesis (Toyoda et al. J. Virol., 2015; Venzke et al. J. Virol., 2006; Gobeil et al J. Virol., 2013).

      (9) For Statistical analysis. Biological replicates are adequate, and statistical tests are generally appropriate. For transparency, report n values, exact p-values, and the statistical test used in every figure legend and discussed in the results.

      Thank you for highlighting the importance of transparency in statistical reporting. We confirm that the n values for all experiments have been included in the figure legends. The statistical tests used for each analysis are also clearly indicated in the figure legends, and the interpretation of these results is discussed in detail in the Results section. Furthermore, the Methods section specifies the tests applied and the thresholds for significance, ensuring full transparency regarding our analytical approach.

      In accordance with established conventions in the field, we have utilized categorical significance indicators (e.g., n.s., *, **, ***) within our figures to enhance readability and focus on biological trends. This approach is widely adopted in high-impact literature to prevent visual clutter. However, to ensure full transparency and reproducibility, we have ensured that the underlying statistical tests and thresholds are clearly defined in the respective figure legends and Methods section.

      Reviewer #4:

      We thank the reviewer for considering that this work is presented in a clear fashion, and the main findings are properly highlighted, and for remarking that the paper is of interest to the retrovirology community and possibly to the broader virology community.

      We also agree on the interest that X4-gp120 clusters CXCR4<sup>R334X</sup> suggests a different binding mechanism for X4-gp120 from that of the natural ligand CXCL12, an aspect that we are now evaluating. These data also indicate that WHIM patients can be infected by HIV-1 similarly to healthy people.

      (1) The observation that "empty VLPs" reduce CXCR4 diffusivity is potentially interesting. However, it is not supported by the data owing to insufficient controls. The authors correctly discuss the limitations of that observation in the Discussion section (lines 702-704). However, they overinterpret the observation in the Results section (lines 509-512), suggesting non-specific interactions between empty VLPs, CD4 and CXCR4. I suggest either removing the sentence from the Results section or replacing it with a sentence similar to the one in the Discussion section.

      In accordance with the reviewer`s suggestion, the sentence in the result section has been replaced with one similar to that found in the discussion section. In addition, we have performed Raster Image Correlation Spectroscopy (RICS) analysis using the Di-4-ANEPPDHQ lipid probe to assess membrane fluidity by means of membrane diffusion, and compared the results with those of cells treated with Env(-) VLPs. The results indicated that VLPs did not modulate membrane fluidity (Author response image 8). Nonetheless, these results do not rule out other potential non-specific interactions of the Env(-) VLPs with other components of the cell membrane that might affect receptor dynamics (see our response to point 2 of reviewer #3).

      (2) In the case of the WHIM mutant CXCR4-R334X, the addition of "empty VLPs" did not cause a significant change in the diffusivity of CXCR4-R334X (Figure 4B). This result is in contrast with the addition of empty VLPs to WT CXCR4. However, the authors neither mention nor comment on that result in the results section. Please mention the result in the paper and comment on it in relation to the addition of empty VLPs to WT CXCR4.

      We would remark that the main observation in these experiments should focus on the effect of gp120-VLPs, and the results indicates that gp120-VLPs promoted clustering of CXCR4 and of CXCR4<sup>R334X</sup> and reduced their diffusion at the cell membrane. The Env(- ) VLPs were included as a negative control in the experiments, to compare the data with those obtained using gp120VLPs. However, once we observed some residual effect of the Env(-) VLPs, we decided to give a potential explanation, formulated as a hypothesis, that the Env(-) VLPs modulated membrane fluidity. We have now performed a RICS analysis using Di-4-ANEPPDHQ as a lipid probe (Author response image 9). The results suggest that Env(-) VLPs do not modulate cell membrane fluidity, although we do not rule out other potential interactions with membrane proteins that might alter receptor dynamics. We appreciate the reviewer’s observation and agree that this result can be noted. However, since the main purpose of Figure 4B is to show that gp120-VLPs modulate the dynamics of CXCR4<sup>R334X</sup> rather than to remark that the Env(-) VLPs also have some effects, we consider that a detailed discussion of this specific aspect would detract from the central finding and may dilute the primary narrative of the study.

      Minor comments

      (1) It would be helpful for the reader to combine thematically or experimentally linked figures, e.g., Figures 3 and 4.

      (2) Figures 3 and 4 are very similar. Please unify the colours in them and the order of the panels (e.g. Figure 3 panel A shows diffusivity of CXCR4, while Figure 4 panel A shows MSI of CXCR4-R334X).

      While we considered consolidating Figures 3 and 4, we believe that maintaining them as separate entities enhances conceptual clarity. Since Figure 3 establishes the baseline dynamics for wildtype CXCR4 and Figure 4 details the distinct behavior of the CXCR4<sup>R334X</sup> mutant, keeping them separate allows the reader to fully appreciate the specificities of each system before making a cross-comparison.

      (3) Some parts of the Discussion section could be shortened, moved to the Introduction (e.g., lines 648651), or entirely removed (e.g., lines 633-635 about GPCRs).

      In accordance, the Discussion section has been reorganized and shortened to improve clarity.

      (4) I suggest renaming "empty VLPs" to "Env(−) VLPs" (or similar). The name empty VLPs can mislead the reader into thinking that these are empty vesicles.

      The term empty VLPs has been renamed to Env(−) VLPs throughout the manuscript to more accurately reflect their composition. Many thanks for this suggestion.

      (5) Line 492 - please rephrase "...lower expression of Env..." to "...lower expression of Env or its incorporation into the VLPs...".

      The sentence has been rephrased

      (6) Line 527 - The data on CXCL12 modulating CXCR4-R334X dynamics and clustering are not present in Figure 4 (or any other Figure). Please add them or rephrase the sentence with an appropriate reference. Make clear which results are yours.

      (7) Line 532 - Do the data in the paper really support a model in which CXCL12 binds to CXCR4R334X? If not, please rephrase with an appropriate reference.

      Previous studies support the association of CXCL12 with CXCR4<sup>R334X</sup> (Balabanian et al. Blood, 2005; Hernandez et al. Nat Genet., 2003; Busillo & Benovic Biochim. Biophys. Acta, 2007). In fact, this receptor has been characterized as a gain-of-function variant for this ligand (McDermott et al. J. Cell. Mol. Med., 2011). The revised manuscript now includes these bibliographic references to support this commentary. In any case, our previous data indicate that CXCL12 binding does not affect CXCR4<sup>R334X</sup> dynamics (García-Cuesta et al. PNAS, 2022).

      (8) Line 695 - "...lipid rafts during HIV-1 (missing word?) and their ability to..." During what?

      Many thanks for catching this mistake. The sentence now reads: “Although direct evidence for the internalization of CD4 and CXCR4 as complexes is lacking, their co-localization in lipid rafts during HIV-1 infection (97–99) and their ability to form heterocomplexes (22) strongly suggest they could be endocytosed together.”

    1. eLife Assessment

      This study presents important findings for the understanding of central brain circuits that underlie nociception-induced escape. Using a laser-based nociception assay, chronic neuronal silencing, trans-Tango anatomical tracing, and reference to connectomic data, the authors propose that nociceptive signals (from painless- and trpA1-expressing neurons) converge on a subset of dopaminergic neurons (subsets of PPL1 and PAM), which in turn engage mushroom body output neurons (MBONs) to shape escape latency. However, methods and controls fall short of fully supporting the findings, rendering the evidence incomplete. This study will be of interest to scientists studying nociception and learning and memory circuits.

    2. Reviewer #1 (Public review):

      Summary:

      Yang et al. investigate the central pathways underlying nociceptive responses in Drosophila. The authors employ a behavioral platform they previously developed, which uses laser stimulation to deliver nociceptive stimuli while enabling automated tracking of fly behavior. By combining large-scale behavioral screening with circuit tracing approaches, the study identifies a set of dopaminergic neurons (DANs) and mushroom body output neurons (MBONs) that participate in the transmission of nociceptive signals. Nociceptive escape behavior has generally been regarded as largely reflexive. It is therefore intriguing that the mushroom body, a neural circuit classically associated with learning, is involved in this process. In particular, the recruitment of dopaminergic neurons typically linked to both appetitive and aversive valence is noteworthy and raises interesting questions about how nociceptive information is integrated within the circuits. Overall, the findings are conceptually interesting and may provide useful insights into dissecting the nociceptive escape behavior.

      Strengths:

      The behavioral assay used in this study is high-throughput and appears reproducible. The authors screened a large number of genetic lines, and the behavioral responses were carefully quantified. The trans-Tango tracing results are consistent with the behavioral screening results. And the observation that circuits typically associated with learned behaviors (mushroom body) contribute to a nociceptive escape response, generally considered a hard-wired reflex, is conceptually interesting.

      Weaknesses:

      The use of laser stimulation to induce nociceptive stimuli makes the paradigm difficult to combine with calcium imaging or optogenetic manipulations. As a result, the study lacks functional and temporally precise tests of the proposed circuit mechanisms.

      Several aspects of the Methods section require additional detail:

      (1) How was the behavioral potency level calculated? Since some of the split-GAL4 lines label multiple neurons, and the individual neurons may innervate multiple compartments. It is therefore unclear how a single "behavioral potency level" value was assigned to a compartment.

      (2) Additional details are needed on how velocity was calculated, particularly the time window used for the analysis. In the Kir-silenced condition, the variation in velocity appears smaller than in the control group, which would benefit from clarification.

      (3) Connectome analysis. More details are needed regarding how DAN-MBON connectivity was quantified in Figure 5. For example, were only DAN → MBON connections considered, or were bidirectional connections included?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript aims to identify the central nervous system circuitry, specifically within the mushroom body (MB), that mediates nociception-induced escape behavior in adult Drosophila. The authors provide a detailed map of the neural pathways underlying defensive actions in flies. Overall, the study is technically solid, clearly written, and conceptually<br /> interesting.

      Strengths:

      The authors present compelling evidence by integrating multiple complementary approaches. The ALTOMS laser system enables precise, automated measurement of escape latency, allowing for high-throughput and objective behavioral quantification. Neuronal silencing experiments assess functional necessity and demonstrate that specific dopaminergic neurons (DANs) and mushroom body output neurons (MBONs) are critical for escape behavior. Trans-Tango anatomical mapping further supports the proposed circuit by identifying putative synaptic connections consistent with the authors' model.

      Weaknesses:

      A central limitation of the study is its heavy reliance on chronic Kir2.1-mediated neuronal silencing as the primary functional manipulation. This approach raises concerns about potential developmental compensation and indirect network effects. The authors could strengthen their conclusions by incorporating more temporally precise, reversible silencing strategies, such as recently developed optogenetic- or chemogenetic-based methods.

      In addition, the study relies on the trans-Tango system to identify downstream synaptic partners, which has several inherent limitations. Trans-Tango detects only chemical synapses and cannot reveal electrical coupling. The system may also yield false negatives due to reporter sensitivity, and anatomical labeling alone does not establish functional connectivity in the context of the specific behavior examined.

    4. Reviewer #3 (Public review):

      Summary:

      Yang et al sought to describe central brain circuits that underlie nociception-induced escape in Drosophila using a combination of neurogenetic tools to silence subsets of neurons and to trace their postsynaptic connections. They present interesting data that identify subsets of DANs and MBONs that are required for a jumping response to an aversive stimulus, but not for baseline locomotion, and present a model for linking peripheral nociception to MB- dependent escape behavior.

      Strengths:

      They use an innovative avoidance assay to elicit a robust behavioral response and use trans-tango to identify downstream targets of painless and TrpA1-expressing neurons.

      Weaknesses:

      This reviewer's enthusiasm for the study is lowered due to an incomplete description of methods, methods section, appropriate behavioral controls, immunohistochemistry data, and a complete behavioral screen of DANs and MBONs. Below I list my suggestions, questions, and criticisms.

      (1) Behavioral studies are interesting. The assay is simple, yet innovative. However, there is no power analysis or explanation of how sample sizes were selected. I commend the authors for including a positive control; however, although UAS-controls are present, there are no GAL4-controls included in the study. Given that many of the lines used for behavior are split-GAL4's, it's unclear if the additional transgene influenced behavior. This should be addressed.

      (2) It is also not clear from the methods how the behavior was run and how it was analyzed. Was baseline locomotion recorded before the laser was introduced? I assume this is the case; however, more importantly, how long after the flies were introduced to the arena were baseline recordings collected? How much data was used to calculate velocity? Were the experimenters blind to the conditions they were assessing? More detail in the methods is essential for understanding the data and providing an opportunity to replicate results.

      (3) At times, the authors describe "locomotion velocity" as baseline locomotion, but other times, they describe it as escape velocity (see reference to Figure 1F). The authors should clarify whether escape velocity was calculated.

      (4) Immunohistochemistry: There is a lack of detail regarding a description of the flies used for trans-tango experiments. How many brains were evaluated? Was there variability across brains? Were the flies males or females? This is an important detail as sex could impact the level of expression of the ligand and therefore the results. It is also not clear at what age these flies were dissected and at what temperature they were raised. This can also significantly affect the post-synaptic signal that is measured (see Talay et al 2017).

      (5) Figure 2 shows the overlap of trans-tango and dopamine signal, but there is no signal for the GAL4-line to evaluate the overlap between presynaptic signal and postsynaptic signal. This expression is an important consideration and should be included.

      (6) Expression of the GAL4 lines in the central brain is also important to show because the authors suggest that, because painless and TrpA1 expression does not fully overlap in peripheral tissue, it might converge in the central brain. Does that central brain expression of painless and TrpA1 overlap?

      (7) Further, although the authors clearly label the different dopamine subsets (PPL1, PAL, and PAM), some orientation with regard to where these images were taken would be helpful. I recommend a stack showing the location of the cell bodies and then a zoom in to see the overlap.

      (8) Behavioral data for DANs and MBONSs: I recommend that the authors discuss the results by the neurons that are targeted and not the driver lines. For instance, the authors suggest they get the largest effects for 433B, 434B, and 298B, but all of these lines target very similar neuronal subsets y4>y1y2. It's also not clear why different split-lines were selected. Several of the lines have overlapping expression, and other compartments were not included at all. In order to determine which MBONs and DANs are required for escape behavior, all MBONs and DANs should be included. See Aso et al for a list of recommended lines for behavior based on specificity and intensity.

      (9) Based on trans-tango data, it is not clear why the authors focus exclusively on PPL1 and PAM when PAL, PPM1, 2, 3, and PPL2 also overlap with painless and trpA1. Certainly, PPL1 and PAM DANs innervate the MB, but so do some of the other DANs identified.

      (10) For Figure 5, the titles of A and B are DANs and MBONs, but it is really showing the average jumping response when neurons that innervate MB compartments are silenced. Many DANs and MBONs innervate multiple compartments (PPL1-a`2a2, etc.); thus, if the intention is to identify neural circuits that modulate escape response, the analysis should focus on the neurons, not the MB compartments. I recommend reorganizing this data so it highlights the DANs and MBONs instead of the MB compartments. I also recommend showing error bars for averages and/or raw data and organizing the x-axes so DAN and MBON compartments can be easily compared.

      (11) Lastly, nuance is lost here in the Behavioral Potency Level, given that some of these compartments are over-represented and not adjusted for the strength of expression in different split-GAL4 lines. Aso et al. (2014) recommended specific split-GAL4 lines based on specificity and intensity. Some of the lines that are included in the average Behavioral Potency are not recommended for behavior based on the intensity of expression, which could significantly influence the potency score.

    5. Author Response:

      We sincerely thank the reviewers for their insightful and constructive suggestions on our manuscript. We are encouraged by the positive recognition of our study’s conceptual significance, particularly the involvement of the mushroom body (MB) in nociceptive escape behavior and the utility of our ALTOMS behavioral platform.

      We fully agree with the reviewers’ assessments and have initiated several key revisions, additional experiments, and analytical refinements to strengthen the study.

      Below is a summary of our planned improvements:

      1. Experimental Revisions and Scope Expansion

      To address concerns regarding potential developmental compensation (Reviewers 1 and 2), we are performing new experiments using temporally precise manipulation tools to confirm the acute necessity of the identified circuits. Additionally, responding to Reviewer 3, we are conducting further behavioral assays to include necessary genetic controls (e.g., split-GAL4-only lines) and expanding our screen to cover all major MBON and DAN compartments using standardized lines to ensure a comprehensive functional map.

      2. Analytical Refinements and Methodological Transparency

      We are revising our quantitative and anatomical reporting to address several technical suggestions from all three reviewers. Specifically, we will implement a weighted “Behavioral Potency Level” that accounts for driver-specific expression intensity and specificity. Anatomical clarity will be enhanced by providing presynaptic expression patterns alongside trans-Tango signals and a neuron-centric data model for Figure 5. Furthermore, the Materials and Methods will be updated to explicitly detail habituation protocols, stimulation timing, sample sizes, while incorporating a more nuanced discussion on the limitations of the tracing systems.

      We believe these revisions will significantly enhance the rigor and clarity of our manuscript. We look forward to submitting the revised version upon completion of these supplementary tasks.

    1. eLife Assessment

      This study provides important insights into how species-specific variation in oxytocin receptor regulatory architecture contributes to diversity in brain expression patterns and social behaviors. By generating multiple BAC transgenic mouse lines carrying the prairie vole oxytocin receptor locus and combining anatomical, molecular, behavioral, and chromatin-structure analyses, the authors present convincing evidence that distal regulatory elements constrain peripheral expression while permitting brain expression aligned with behavior. This study provides an experimental framework and a resource that are of value for dissecting how regulatory variation in neuromodulatory systems contributes to species differences in social behavior. This work will be of interest to those interested in social behavior, oxytocin, neuromodulation, and related conditions.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Tsukamoto et al. describes a compelling approach to understanding whether inter-species differences in social behavior might emerge from differential expression patterns of the oxytocin receptor (Oxtr) in the brain. To this end, they genetically engineer BAC transgenic mouse lines with insertions of a large construct incorporating prairie vole Oxtr gene and surrounding regulatory elements. They name these lines Koi lines. They first evaluate if prairie vole-like Oxtr expression is reproduced in the Koi mouse lines, and they find heterogenous patterns across different lines that do not depend on the number of insertions. While they found that Koi mice can reproduce vole-like expression in PFC, NAc, and BLA, the reproduction was never complete: one Koi line had NAc and mPFC expression, another had BLA expression, etc. They confirmed major expression patterns across 3 methods: crossing with LacZ reporter line, in situ hybridization, and ligand binding (autoradiography). To determine the expression pattern of the BAC insert but not endogenous Oxtr, the authors generated new mouse lines by crossing Koi lines with Oxtr -/- line. Importantly, they found that Oxtr expression pattern in the mammary gland was similar across all lines, and wild-type mice.

      The authors used Koi:Oxtr-/- lines to test social behavior, specifically partner preference ( a behavior specific to prairie voles) and maternal behavior. They find that different Koi lines showed different changes in these behaviors compared to wild-type mice. Moreover, while some lines showed changes in partner preference, others seemed to show changes in maternal behavior. For one of the lines (Koi4), the partner preference and the maternal behavior were incongruent.

      The manuscript then hypothesizes that the Oxtr gene is positioned in different 3D chromatin structures across species and across tissues, leading to more rigid expression in the mammary glands, but more flexible expression patterns in the brain.

      Strengths:

      This study has major implications in the field of oxytocin research, and more broadly in the field of neuromodulation. It is novel, bold, and rigorous.

      Weaknesses:

      (1) The expression in the brain and mammary gland (Figure 2) was not quantified, preventing a more objective conclusion that the brain has flexible expression and mammary gland expression is rigid.

      (2) In Figure 7, a similar heatmap for the mammary gland is missing.

      (3) Partner preference in males was not tested.

      (4) It is unclear if in the behavioral testing the stimulus animals were the same genotype as the focal female or were wild-types. This could have an impact on the behavioral outcome.

    3. Reviewer #2 (Public review):

      Summary:

      This is a bold and important study and addresses an important question in the field: how species-specific variation in brain oxytocin receptor expression relates to differences in social behavior.

      Tsukamoto et al. generated eight independent transgenic mouse lines (Koi lines) carrying a bacterial artificial chromosome (BAC) encompassing the prairie vole Oxtr locus along with flanking intergenic regions, with the goal of probing the behavioral consequences of species-specific variation in brain Oxtr expression. Across these "volized" lines, the authors claim conserved Oxtr expression in the mammary gland but strikingly divergent patterns of brain expression, none of which fully recapitulate endogenous prairie vole Oxtr distribution, and instead exhibit expression patterns that diverge from both mouse and prairie vole brain Oxtr distribution. Nevertheless, some lines exhibit partial overlap with vole Oxtr expression pattern reported in the literature within specific brain regions, and one line displays partner preference behavior reminiscent of prairie voles. The authors further report line-dependent differences in maternal pup retrieval and crouching behaviors, which they interpret as evidence that variation in brain Oxtr expression can drive variation in social behaviors. Together with analyses of topologically associating domain (TAD) architecture, the authors conclude that brain, but not peripheral- Oxtr expression, is shaped by distal regulatory elements beyond the BAC insert, and propose that such regulatory flexibility underlies evolutionary diversification of social behavior.

      Strengths:

      A particular strength of the study is the generation of multiple independent transgenic lines, which provides a valuable resource for probing regulatory influences on Oxtr expression.

      Weaknesses:

      While the study addresses an important question, I have several methodological and conceptual concerns regarding the study in its current form. Some aspects of the study fall outside my primary area of expertise, and I am therefore not in a position to fully evaluate the technical difficulty or rigor of those components, or to judge whether my suggestions would be feasible to implement. I defer to reviewers with relevant expertise for a more detailed assessment of these aspects.

      (1) Each independent Koi line exhibits a distinct brain expression pattern that differs from both wild-type mouse and prairie vole Oxtr expression, complicating the interpretation of the results. The manuscript does not include a direct comparison of brain Oxtr expression patterns in these transgenic lines with those of prairie voles. Instead, expression similarity is inferred primarily from regional localization and compared indirectly with prior literature (Figures 2-5). For those lines that show partial resemblance to prairie vole Oxtr expression patterns, the authors do not assess whether Oxtr-expressing neurons share comparable anatomical projections or transcriptomic identity with prairie vole Oxtr-expressing neurons. Quantification of expression remains largely descriptive, illustrating expression patterns (Figure 2), OXTR protein distribution (Figure 3; images are difficult to evaluate due to low contrast), or Oxtr mRNA levels across selected brain regions in Koi lines, wild-type mice, and mOxtr-/- mice (Figures 4-5), without directly testing similarity to prairie vole expression. In addition, whole-brain expression data are lacking, with analyses restricted to selected sections. While such analyses may be beyond the scope of the present study, these limitations nonetheless complicate interpretation of the central question - namely, whether the observed behavioral phenotypes arise from vole-like Oxtr circuits rather than from distinct, line-specific expression configurations.

      (2) The authors state that Oxtr expression in the mammary gland is similar across all Koi lines and the mOxtr-IRES-Cre knock-in line. However, the images presented in Figure 2 appear to show differences in anatomical detail across lines, and no quantitative analysis is provided to support the claim of equivalence.

      (3) The conclusion that integration site rather than copy number determines the observed BAC transgene expression patterns (Lines 202-203) is not fully supported by the data. First, the authors did not compare multiple copy numbers at the same genomic insertion site, making it impossible to disentangle copy-number effects from position effects. Second, BAC copy number does not necessarily scale linearly with expression; higher copy numbers can have a repressive effect on gene expression (Garrick et al, Nat Genet, 1998).

      (4) While I am not an expert in TAD analysis, the observed differences in 3D architecture around Oxtr are consistent with a role for long-range regulatory interactions. However, these analyses appear largely descriptive and correlative, and establishing a causal contribution of 3D chromatin organization to Oxtr regulation by distal elements would likely require direct perturbation of TAD boundaries or looping interactions. I recognize that such experiments may be beyond the scope of the present study, but clarifying this limitation in the interpretation would be helpful.

    4. Thank you very much for your careful evaluation of our manuscript entitled “Cross-Species BAC Transgenesis Reveals Long-Range Regulation Drives Variation in Brain Oxytocin Receptor Expression and Social Behaviors.” We sincerely appreciate the insightful and constructive comments from both reviewers.

      We are particularly encouraged by the positive assessment that our study provides a useful experimental framework and resource for understanding how regulatory variation contributes to diversity in brain expression patterns and social behaviors. We have carefully considered all comments and outline below the key revisions we will implement in the revised manuscript.

      Conceptual clarification: We will clarify the conceptual framework of the study. While our initial aim was to test whether prairie vole regulatory elements could recapitulate vole-like Oxtr expression patterns in mice, the generation of multiple independent Koi lines revealed that such expression is not faithfully reproduced but instead varies across lines. This observation led us to refocus the study on how regulatory architecture gives rise to diverse expression patterns and their functional consequences. Accordingly, we will revise the manuscript to emphasize that the goal is not to reconstruct prairie vole circuits, but to test how variation in Oxtr expression distribution drives variation in social behaviors.

      Quantification of expression patterns: We will include quantitative analyses of Oxtr expression in both brain and mammary gland tissues. These additions will provide an objective basis for comparing tissue-specific expression and support the conclusion that brain expression is more variable, whereas mammary gland expression is broadly conserved. We will include qRT-PCR data to support mammary gland comparisons.

      Behavioral interpretation: We will clarify that the behavioral analyses are designed to assess how distinct Oxtr expression patterns influence social behaviors within a controlled mouse system, rather than to directly replicate prairie vole phenotypes. We will refine the manuscript to clearly distinguish between partial resemblance to prairie vole expression and the broader goal of linking regulatory variation to behavioral diversity.

      Technical clarification and limitations: We will revise the manuscript to more carefully interpret the roles of genomic integration site and transgene copy number, noting that while integration site likely plays a major role, contributions from copy number cannot be excluded. In addition, we will explicitly acknowledge that our analyses of 3D chromatin architecture are correlative in nature, and that establishing causality would require direct perturbation of chromatin structure, which is beyond the scope of the current study.

      Presentation improvements: We will improve figure clarity, include representative reference images from prairie vole brain to facilitate qualitative comparison, and refine descriptions in the Results and Methods sections to enhance clarity and readability.

      We thank the reviewers again for their insightful and constructive feedback, which we believe will significantly strengthen the manuscript. We look forward to submitting a revised version incorporating these improvements.

    1. eLife Assessment

      This work presents a valuable new open-source tool for wirelessly controlling optogenetic stimulation in neuroscience experiments in behaving rodents. Evidence for its potential usefulness in different types of optogenetic experiments is solid, although some details and concerns were viewed as lacking or overlooked (e.g., system latency, battery weight). The work is expected to interest neuroscientists working with optogenetics and neuroengineers developing small-sized integrated devices for rodent experiments.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents a wireless device for closed-loop control of optogenetic stimulation based on behavioral triggers. The authors demonstrate the device through two behavioral experiments in mice, showcasing the device's capabilities and emphasizing open accessibility and using off-the-shelf components.

      Strengths:

      The paper presents a device that is open access and easily reproducible for wireless stimulation in a closed loop based on behavioral triggers. Other strengths of the device include the simultaneous use of multiple devices in parallel and the claimed ease of integration with existing frameworks. The paper shows to behavioral experiments on multiple mice along with some device validation results.

      Weaknesses:

      The main weakness of the presented device lies in the lack of flexibility in stimulation power. For a device that is intended for stimulation only, having to physically change a component on the board to adapt stimulation power is a major downside. Reprogrammable stimulation current is not complex to implement and should really have been included on this device. Another weakness lies in the limited battery life of the device. While using a battery-powered device decreases spatial constraints, allowing for the maze experiment presented in the paper, it also means the lifespan of the device is limited compared to an inductively powered device, limiting its ability for long-term experiments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

    4. Reviewer #3 (Public review):

      Summary:

      This study presents a novel device for wireless control of optogenetic stimulation of the mouse brain, the Blueberry, using Bluetooth Low Energy (BLE) communication for parallel activation of up to 4 devices through an Arduino interface. The authors also present two types of brain implants for light delivery that can be connected to the Blueberry: one using uLEDs for surface cortical stimulation, and another using optical fibers for intra- or sub-cortical implants. The architecture of the system, including electronics, communication, and programming, is thoroughly described. Because the system was especially designed to be integrated with existing software used for neuroscience behavioral experiment for closed-loop experiments, validation of the system is shown on two different scenarios: a learning task in a "infinite" Y-maze, where light delivery at precise locations conditions arm choice for navigation; and a social interaction analysis where 3 animals are simultaneously stimulated in order to alter social dynamics among the group.

      Strengths:

      (1) The full system can be built by individual labs with simple PCB printing, off-the-shelf components, and readily available hardware (Arduino) for widespread dissemination.

      (2) Four headstages can be controlled in parallel for simultaneous experiments with multiple mice.

      (3) Validation across different relevant behavioral tests, demonstrating the potential of integrating Bluberry in closed-loop setups.

      Weaknesses:

      (1) Some details in the manuscript regarding system characterization (latency, battery life, etc) are included only in the supplementary materials.

      (2) The practical details of integration with other commercial and open-source software used for the closed-loop experiments, which could help third-party researchers interested in using the system, are lacking sufficient detail.

      (3) System range (3 meters reported) is limited for a BLE device.

      (4) Light output amplitude is not programmable, limiting the choice of stimulation protocols and LEDs used.

      (5) Thermal modeling of the cortical surface stimulator was not performed, and it is unclear if the brain implant for this purpose is within the safety limits.

      (6) The paper is missing a comparison with other state-of-the-art devices for wireless control of optogenetic stimulation in mice.

    5. Author response:

      eLife Assessment

      This work presents a valuable new open-source tool for wirelessly controlling optogenetic stimulation in neuroscience experiments in behaving rodents. Evidence for its potential usefulness in different types of optogenetic experiments is solid, although some details and concerns were viewed as lacking or overlooked (e.g., system latency, battery weight). The work is expected to interest neuroscientists working with optogenetics and neuroengineers developing small-sized integrated devices for rodent experiments.

      We thank the eLife team for taking the time to consider and assess our manuscript. Please find below our provisional author responses accompanying the first version of the Reviewed Preprint.

      We would like to clarify an important error regarding the battery model reported in the manuscript. We mistakenly referred to the CP1254-A3 (1.8 g), whereas the battery used for all devices is the CP9440 A4X (0.8 g).

      Importantly, this correction reduces the total device weight by approximately 1 g compared to the value assumed by Reviewer #3. We believe this directly addresses the concern raised regarding battery weight in both the individual review and the overall eLife assessment.

      We will correct this error in the revised manuscript and clearly report the exact battery model and total device weight.

      For reference, the official VARTA CoinPower catalog is available here:

      https://www.varta-ag.com/fileadmin/varta/industry/downloads/products/lithium-ion-cells/VARTA_CoinPower_EN_digital_221124_A5_6p.pdf

      The battery used in BlueBerry is listed on the last line of page 2.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents a wireless device for closed-loop control of optogenetic stimulation based on behavioral triggers. The authors demonstrate the device through two behavioral experiments in mice, showcasing the device's capabilities and emphasizing open accessibility and using off-the-shelf components.

      Strengths:

      The paper presents a device that is open access and easily reproducible for wireless stimulation in a closed loop based on behavioral triggers. Other strengths of the device include the simultaneous use of multiple devices in parallel and the claimed ease of integration with existing frameworks. The paper shows to behavioral experiments on multiple mice along with some device validation results.

      We thank the reviewer for the statement.

      Weaknesses:

      The main weakness of the presented device lies in the lack of flexibility in stimulation power. For a device that is intended for stimulation only, having to physically change a component on the board to adapt stimulation power is a major downside. Reprogrammable stimulation current is not complex to implement and should really have been included on this device. Another weakness lies in the limited battery life of the device. While using a battery-powered device decreases spatial constraints, allowing for the maze experiment presented in the paper, it also means the lifespan of the device is limited compared to an inductively powered device, limiting its ability for long-term experiments.

      We thank the reviewer for these valuable comments. We did consider implementing programmable control of stimulation power, for example using a digital potentiometer. However, in our current design this approach was not sufficient because the output current supported by typical digital potentiometers is too low for the high-power LEDs used in our system. For this reason, we did not include programmable stimulation current in the present version. We agree that this is a limitation and that further work is needed to identify a suitable solution for adjustable stimulation power, which we plan to pursue in future versions of the device. We will revise the manuscript to make this limitation and future direction clearer.

      We also agree that the use of a battery-powered wireless system introduces an important trade-off. We will revise the manuscript to discuss this limitation more explicitly.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      We appreciate the reviewer’s enthusiasm. Thank you.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      We thank the reviewer for this important suggestion. We agree that this is a missing characterization in the current manuscript. In the revised version, we will include a more detailed analysis of the system’s power budget, including the maximum stimulation power supported by the BlueBerry device, the corresponding output currents, and the contribution of the main integrated circuits to overall current consumption.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      In the revised manuscript, we will clarify more explicitly that latency is a key limitation of the current system. We will also further investigate the source of this latency, including whether it can be reduced through additional configuration changes. In addition, we will include comparative latency measurements using different Arduino modules as the central BLE controller for the BlueHub device.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      We will investigate this matter and provide a simpler install and configuration script to setup both the BlueHub and Blueberry systems.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      Due to the current structure of the ArduinoBLE library used in BlueHub devices, each BlueHub unit can support active communication with up to maximum 3 BlueBerry units. We thank the reviewer for highlighting this point and in the next version of the paper we will clarify this point.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

      In the revised submission, we will deposit all code used to program both the BlueHub and BlueBerry devices, together with the Gerber files required for PCB fabrication, alongside the paper.

      Reviewer #3 (Public review):

      Summary:

      This study presents a novel device for wireless control of optogenetic stimulation of the mouse brain, the Blueberry, using Bluetooth Low Energy (BLE) communication for parallel activation of up to 4 devices through an Arduino interface. The authors also present two types of brain implants for light delivery that can be connected to the Blueberry: one using uLEDs for surface cortical stimulation, and another using optical fibers for intra- or sub-cortical implants. The architecture of the system, including electronics, communication, and programming, is thoroughly described. Because the system was especially designed to be integrated with existing software used for neuroscience behavioral experiment for closed-loop experiments, validation of the system is shown on two different scenarios: a learning task in a "infinite" Y-maze, where light delivery at precise locations conditions arm choice for navigation; and a social interaction analysis where 3 animals are simultaneously stimulated in order to alter social dynamics among the group.

      Strengths:

      (1) The full system can be built by individual labs with simple PCB printing, off-the-shelf components, and readily available hardware (Arduino) for widespread dissemination.

      (2) Four headstages can be controlled in parallel for simultaneous experiments with multiple mice.

      (3) Validation across different relevant behavioral tests, demonstrating the potential of integrating Bluberry in closed-loop setups.

      We thank the reviewer for the statement.

      Weaknesses:

      (1) Some details in the manuscript regarding system characterization (latency, battery life, etc) are included only in the supplementary materials.

      As correctly mentioned, in the revised manuscript we will move the necessary quantifications from supplementary section to main section.

      (2) The practical details of integration with other commercial and open-source software used for the closed-loop experiments, which could help third-party researchers interested in using the system, are lacking sufficient detail.

      We will clarify this point more clearly in the revised manuscript.

      (3) System range (3 meters reported) is limited for a BLE device.

      The system range reported is the range considered as reliable communication range. In the revised manuscript we quantify this problem by reporting the Received Signal Strength (RSS) value for multiple BlueBerry devices across varying distances.  

      (4) Light output amplitude is not programmable, limiting the choice of stimulation protocols and LEDs used.

      That is indeed a limitation of our system, we will investigate the feasibility of integrating programmable stimulation protocols in the updated version of BlueBerry device.

      (5) Thermal modeling of the cortical surface stimulator was not performed, and it is unclear if the brain implant for this purpose is within the safety limits.

      We thank the reviewer for this comment. In the revised manuscript, we will clarify that the thermal measurements reported here apply only to the specific superficial implant geometry and stimulation conditions used in this study. Because tissue heating depends strongly on implant design and on parameters such as optical power, pulse width, and stimulation frequency, a general safety statement cannot be made for all possible implant configurations. Since the primary goal of this work is to present the wireless device platform rather than to validate a particular implant design, thermal safety should be evaluated individually for each implant and stimulation paradigm.

      (6) The paper is missing a comparison with other state-of-the-art devices for wireless control of optogenetic stimulation in mice.

      In the revised manuscript, we will include a comparison table summarizing our system alongside currently available wireless optogenetic devices.

    1. eLife Assessment

      The manuscript by Mancl et al. provides important mechanistic insights into the conformational dynamics of Insulin Degrading Enzyme (IDE), a zinc metalloprotease involved in the clearance of amyloid peptides. Supported by a compelling combination of time-resolved cryo-EM, SEC-SAXS, enzymatic assays, and both all-atom and coarse-grained simulations, the study reveals an insulin-induced allosteric transition and transient β-sheet interactions underlying IDE's unfoldase activity, thereby refining our understanding of IDE's functional cycle and offering a structural framework for developing substrate-selective modulators of M16 metalloproteases. The latest round of revisions further improves clarity and presentation by updating structural statistics, correcting minor textual inconsistencies, and refining supplemental materials, fully addressing the remaining reviewer comments.

    2. Reviewer #1 (Public review):

      Summary:

      Mancl et al. present an integrative structural and mechanistic analysis of the human insulin-degrading enzyme (IDE), combining cryo‑EM, time‑resolved cryo‑EM, SEC‑SAXS, enzymatic assays, all-atom molecular dynamics (MD) simulations, and coarse‑grained MD simulations. Their study delineates how IDE undergoes coordinated open-close transitions and interdomain rotations, how these motions relate to its unfoldase and protease activities, and how a single residue, R668, acts as a molecular latch governing these conformational changes. Through expanded structural datasets and computational analyses, the authors propose a mechanistic model for how IDE captures, unfolds, and degrades diverse amyloidogenic substrates such as insulin and Aβ.

      Strengths:

      A major strength of this study is its integration of structural, biophysical, biochemical, and computational approaches. The authors now provide six cryo‑EM structures, including a new time‑resolved O/O state captured 123 ms after substrate mixing, which clarifies the early structural response of IDE to insulin binding. The combination of multibody analysis, 3D variability analysis, all‑atom MD, and coarse‑grained Upside simulations yields a coherent picture in which rotational interdomain motions and charge‑swapping events at the IDE‑N/C interface underpin substrate unfolding and repositioning.

      The identification of R668 as a central determinant of the open-close transition, supported by MD, HDX‑MS data from prior work, SEC‑SAXS, and functional assays on the R668A mutant, represents a significant mechanistic advance. The inclusion of Aβ degradation assays adds biological breadth and supports the conclusion that R668 modulates activity in a substrate‑dependent manner.

      The authors have also substantially improved clarity by reorganizing figures, refining section headers, and adding introductory structural schematics. Taken together, the revised manuscript now provides a rigorous and accessible framework for understanding IDE dynamics and their relevance to amyloid peptide turnover.

      Weaknesses:

      At this stage, remaining limitations are modest and inherent to the system rather than the approach. While the study convincingly demonstrates substrate‑dependent modulation of IDE dynamics, it does not experimentally assess additional endogenous substrates (e.g., amylin, glucagon), which would be needed to fully generalize the role of R668 across the substrate spectrum of IDE. Furthermore, the timescale mismatch between MD simulations and catalytic turnover, which the authors clearly acknowledge, means that correlations between simulated motions and enzymatic kinetics remain inferential. Finally, some flexible cryo‑EM states (particularly O/pO) continue to exhibit moderate local resolution, which constrains atomic interpretation of highly dynamic regions, although this is addressed transparently.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complemented and analyzed in atomic detail by using MD simulation studies. The studies are meticulously conducted and lay the ground for future exploration of the protease structure-function relationship.

      Strengths:

      The manuscript presents a powerful integrative structural biology study that combines high-resolution cryo-EM, particle heterogeneity analysis, time-resolved cryo-EM, multiscale molecular dynamics simulations, SAXS, and biochemical assays to dissect the conformational dynamics of human insulin-degrading enzyme. A major strength is the identification of a previously unappreciated rotational component of IDE-N relative to IDE-C and the discovery of R668 as a molecular latch governing the open-close transition, supported consistently by structural, computational, mutational, and functional data. The work provides a coherent mechanistic framework linking IDE dynamics to substrate unfolding, allostery, and substrate-dependent catalysis, with clear relevance to diabetes and Alzheimer's disease biology.

      Weaknesses:

      Despite its depth, several key mechanistic conclusions-particularly substrate unfolding and the proposed "β-grabbing" mechanism-rely heavily on coarse-grained and all-atom MD simulations rather than direct experimental observation. Cryo-EM density for insulin is limited and heterogeneous, restricting definitive structural interpretation of substrate binding modes. The time-resolved cryo-EM experiment captures only a single dominant state at modest resolution, limiting insight into transient intermediates. In addition, the study focuses primarily on insulin, leaving the generality of the proposed mechanism for other IDE substrates insufficiently tested, and the therapeutic implications remain largely speculative without direct pharmacological modulation data.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

      We are grateful for the dedication and constructive feedback provided by the editors and reviewers. We have revised our manuscript according to the suggestions by both reviewers.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The new version of the manuscript reads exceedingly well and the corrections the authors have made during their revision made the manuscript much easier to read and digest than the first version. Below are minor details that may be corrected:

      Abstract:

      Line 45-47: "IDE is known to transition between a closed state, poised for catalysis, and an open state, able to release cleavage products and bind a new substrate." (consider adding a)

      Fixed

      Line 48-50: "Combining cryo-EM heterogeneity analysis with all-atom molecular dynamics (MD) simulations, we identified the structural basis and key residues for IDE conformational dynamics that were not previously revealed by IDE static structures." (consider adding previously)

      Changed

      Line 52-54: "Our small-angle X-ray scattering analysis and enzymatic assays of an R668A mutant indicate a profound alteration of conformational dynamics and catalytic activity." (consider adding analysis)

      Changed

      Line 54: Consider leaving out "Upside" in the abstract (to avoid confusion when reading the abstract) and leave it to be introduced in the introduction when Upside MD simulations are first mentioned.

      Changed

      Results:

      Figure 5D: There seems to be an error in the legend for Figure 5D. It says "... presence of varying amounts of insulin", but this must be Aβ1-40. Please add info on whether the replicates are technical or biological.

      The legend has been revised as suggested.

      Line 125: Consider switching the order of "here" and "we"

      “here” has been removed.

      Line 128: Replace "5" with "five"

      Changed

      Line 137: Replace "when insulin is present" with "in the presence of insulin"

      Changed

      Line 228: Replace "5" and "6" with "five " and "six"

      Changed

      Line 229: Consider adding the word "form": "First, the open subunits did not close to form a singular structure."

      We have adjusted the sentence to read “close to a singular consensus structure”

      Line 327: Replace "2" with "two"

      Changed

      Line 276: Consider replacing "Conversely" with a more suitable connecting term as it implies that the observation presented in the two sentences are reverse or rephrase what is being compared. Is it the fact there is a dose dependency or not between the substrates or is it the actual kinetic parameters that are described. I just don't think conversely is fair with the current formulation as "the R668A mutant did not exhibit a dose-dependent response to the presence of Aβ" not that the Ki is reduced for WT compared to the R668A construct when looking at Aβ.

      The connecting term has been removed completely, beginning the sentence with “When Abeta…”

      Line 359: Replace "6" with "six"

      Changed

      Consider getting rid of possessive apostrophes to keep a formal tone, e.g. lines 211 (cryoSPARC's), 259 (IDE's) and 382 (IDE's). Exception to this is Alzheimer's disease.

      All instances of possessive apostrophes, aside from Alzheimer’s, have been replaced alter more formal wording.

      Figure 7 supplement 1: The color scheme for the local resolution is missing the unit (Å).

      This has been corrected.

      Finally, the supplementary videos illustrating IDE conformational dynamics are difficult to interpret and somewhat redundant in their current form. The transitions occur very rapidly, making it hard to appreciate the described motions, and the uniform coloring of IDE further limits visual clarity. I apologize for not including this point in my initial review. I recommend either removing the videos or re-rendering them to improve interpretability, for example by slowing down the motion and applying the same domain color scheme introduced in the new Figure 1 (and used in the MD trajectory video). This would greatly aid readers in connecting the descriptions in the text to the visual representations in the movies.

      Figure 3 videos 1-4 were slowed down, simplified, and recolored to improve clarity.

      Reviewer #2 (Recommendations for the authors):

      Comments after first revision for authors:

      Thanks a ton to the authors for the detailed explanation on my comments. I believe the discussions will help a large group of audience, especially the non-experts. Please address the minor comment below:

      Minor comment:

      Please update Supplementary file 1 (Cryo-EM data collection, refinement, and validation statistics) regarding the new volume obtained by time-resolved cryo-EM. Kindly also check line 47 in the abstract: "Here, we present five cryo-EM structures" , which may need an update (six structures and resolution 3.0-5.1 Å) or rephrase the sentences accordingly. If similar instances are found in the manuscript, where list of all the structures are mentioned together, please update accordingly if necessary.

      The cryo-EM statistics for the time-resolved cryo-EM are shown in supplementary file 2 to differentiated two datasets. The abstract has been changed, as has line 149.

    1. eLife Assessment

      This study provides valuable insights into addressing the question of whether the prevalence of autoimmune disease could be driven by sex differences in the T cell receptor (TCR) repertoire, correlating with higher rates of autoimmune disease in females. The authors compared male and female TCR repertoires using bulk RNA sequencing, from sorted thymocyte subpopulations in pediatric and adult human thymuses; however, the analyses provided do not provide sufficient discrimination, as paired TCR chains are not examined, and incompletely support the central claims regarding sex differences in the TCR repertoire and potential autoimmune bias.

    2. Reviewer #2 (Public review):

      Summary

      This study addresses the hypothesis that the higher prevalence of autoimmune diseases in women could result from sex-dependent differences in thymic generation or selection of TCR repertoires. The biological question is important and the dataset is valuable. However, the study has major conceptual and analytical limitations.

      In particular:

      - The conclusions cannot be generalized to autoimmune diseases as a whole, as only type 1 diabetes (T1D) and celiac disease (CeD) antigens were analyzed.<br /> - The central interpretation is not supported by the data, as the observed signal is strongly influenced by TCRs associated with T1D, which shows a male-biased incidence and therefore does not align with the female bias the study aims to explain.

      Strengths

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here. However, the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses.

      Weaknesses

      The authors did not adequately address the central concerns raised in the previous review. As a result, the major issues remain unresolved.

      (1) Generalization to autoimmune diseases is not justified.

      The study aims to explain the higher prevalence of autoimmune diseases in females. The main conclusion is based on enrichment in females of TCRs annotated as autoimmune-associated using database matching.<br /> However, these matches correspond exclusively to TCRs specific for T1D and CeD. This already limits the conclusions to these two diseases and does not justify generalization to autoimmune diseases as a whole.

      (2) Contradiction with epidemiology of T1D which is male-biased

      T1D and CeD have opposite sex biases in European populations. While CeD is more frequent in females (~60%; doi:10.1016/j.cgh.2018.11.013), T1D is more frequent in males (male:female = 1.11 in France; doi:10.1111/dom.70124).<br /> Importantly, T1D constitutes a substantial fraction of the autoimmune-associated dataset (42 out of 48 epitopes; 83 out of 185 TRB sequences). Therefore, the observed signal is strongly influenced by a disease that does not follow the female bias the study aims to explain.

      The authors argue that T1D sex bias varies globally, including female-biased incidence in East Asia and Africa. However, this argument does not resolve the issue, as the cohort analyzed in this study was derived from France, where T1D shows a male-biased incidence. Thus, the interpretation remains inconsistent with the population context of the dataset.

      (3) Lack of disease-level and donor-level resolution

      The authors combine T1D and CeD into a single "autoimmune" category and do not provide per-disease, per-donor or per-epitope distributions, despite explicit reviewer's requests.

      This prevents evaluation of whether the observed signal is driven by:<br /> - a specific disease (T1D or CeD), or<br /> - a small number of donors

      Without this analysis, the conclusions cannot be properly interpreted.

      (4) Use of "polyspecificity" concept is not supported by experimental evidence

      The authors extensively use the concept of "polyspecific TCRs," defined as single-chain CDR3 sequences annotated across databases as recognizing distinct and unrelated antigenic categories. This concept is not supported by experimental evidence (except for a single TCR in Quiniou et al., as acknowledged by the authors).

      In the absence of robust validation, a more parsimonious explanation for such ambiguously annotated TCR chains is the presence of false-positive annotations in public databases (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract) or alternatively, distinct TRA pairing for identical TRB sequences resulting in different specificities.

      The observation that these TCRs have high generation probability is expected, as TCRs found in independent studies are likely to have high generation probability. The interpretation of these sequences as biologically meaningful entities (e.g., a "first line of defense") is therefore speculative and not supported by the data.

      The authors also refer to in silico-generated polyspecific TCRs (ref. to Nature Machine Intelligence). However, such sequences are generated ex vivo and do not undergo thymic selection. A TCR capable of recognizing multiple unrelated foreign antigens would likely also recognize self-antigens and be eliminated during negative selection. Therefore, this argument does not support the biological relevance and in vivo existence of the proposed polyspecific TCR class.

      (5) Insufficient statistical analysis of diversity

      The absence of statistically significant differences in repertoire diversity between sexes (Figure 3), despite an apparent visual trend, may reflect limited sample size and insufficient statistical power rather than a true absence of differences. A more appropriate statistical approach, such as mixed-effects modeling, was requested in the previous review but was not performed.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male or female human. To address this, this group sequenced TCRs from doublepositive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. The experiments are heroic, yet do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past databases using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found little overlap of their sequences with these annotated sequences (depending on the individual, ranged from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a generalizable finding in the human population.

      Strengths:

      It is a novel dataset that attempts to understand sex differences in the T cell repertoire in humans. Overall, the methodologies are sound and are the current state-of-the-art. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females. This is an important negative result.

      Weaknesses:

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. This reviewer recognizes the difficulty in obtaining samples for this experiment (which were from deceased donors), and this limitation was appropriately discussed. Their analysis was limited by the current availability of other TCR sequences. These weaknesses were appropriately discussed and considered.

      We thank this reviewer for his appreciation of our work.

      Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues. In particular, the majority of "autoimmunity-related TCRs" considered in this study are in fact specific to type 1 diabetes (T1D). Notably, T1D incidence is higher in males, which directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. Given this conceptual inconsistency, the evidence presented does not support the authors' conclusions.

      We disagree with the reviewer’s assertion that our findings create a conceptual inconsistency.

      Autoimmune diseases are multifactorial conditions in which multiple biological layers, including thymic selection, peripheral immune regulation, hormonal effects, environmental exposures, and tissue-specific vulnerability, contribute to disease incidence. These layers may influence sex ratios in different directions. Therefore, observing a higher frequency of TCRs annotated as T1D-associated in females does not imply that T1D incidence must also be higher in females.

      Actually, T1D incidence itself is not uniformly male-biased worldwide. Epidemiological analyses (reviewed in Qu and Hakonarson, Diabetes Obes Metab 2025) show that male predominance is mainly observed in high-incidence Northern European populations, whereas in several lowerincidence regions, including parts of East Asia and Africa, the sex ratio is balanced or even femalebiased. Furthermore, another recent study highlights that T1D incidence and prevalence in women and men varies depending on the study period (PMC12544016).

      This heterogeneity indicates that disease incidence reflects context-dependent interactions between genetic load, environmental exposures, and sex-specific biological modifiers. Moreover, biological sex acts as a dynamic modifier of genetic risk and immune function in T1D, influencing central tolerance, peripheral immune activation, and β-cell intrinsic resilience (reviewed in Qu and Hakonarson, 2025). Experimental models further demonstrate estrogenmediated protection of pancreatic β-cells (Kim et al., Biochem Biophys Res Commun 2025), indicating that disease incidence reflects the integration of immune, hormonal, and tissuespecific layers rather than central autoreactive TCR release alone. Sex hormones may exert distinct and sometimes opposing effects on thymic selection and on target-organ vulnerability, while environmental factors such as vitamin D status, infections, and microbiota composition further shape disease expression.

      Importantly, our study does not claim causality, nor does it aim to predict the epidemiology of any specific autoimmune disease. Our conclusions are limited to the observation that sexdependent differences exist in thymic TCR selection.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      We agree with the reviewer’s comment. As already stated in the previous revision and the "Data Availability" section of the manuscript, all raw sequencing data have been deposited and are publicly available on NCBI (BioProject PRJNA1379632): https://www.ncbi.nlm.nih.gov/sra/PRJNA1379632.

      Weaknesses:

      I thank the authors for their detailed responses to my previous comments. Several concerns were addressed satisfactorily; however, important issues remain unresolved, and a new major concern has emerged from the revised manuscript.

      Major concerns:

      (1) Autoimmune specificity is dominated by T1D, contradicting the study's premise. Newly added supplementary Table 3 shows that the authors considered only 14 autoimmune-related epitopes, of which 12 are associated with type 1 diabetes (T1D) and 2 with celiac disease (CeD). (I guess this is because identification of particular peptide autoantigens is an extremely difficult task and was only successful in T1D and CeD.) Thus conclusions of this work mostly relate to T1D. However, the incidence of T1D is higher in males than in females (e.g. doi:10.1111/j.13652796.2007.01896.x; doi:10.25646/11439.2). This directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. As a result, the authors' conclusions (a) cannot be generalized to autoimmune disease as a whole as the authors only considered T1D and CeD antigens and (b) are internally inconsistent with the stated objective of the study.

      (2) By contrast, CeD does show a female bias (~60/40 female/male; doi: 10.1016/j.cgh.2018.11.013). However, the manuscript does not allow evaluation of how much the reported "autoimmune TCR enrichment" derives from T1D versus CeD. Despite my previous request, the authors did not provide per-donor and per-epitope distributions of autoimmune-specific TCR matches. I therefore explicitly request a table in which: each row corresponds to a specific autoimmune antigen; each column corresponds to a donor (with metadata available including sex); each cell reports the number of unique TCRs specific to that antigen in that donor. Without such data, the conclusions cannot be evaluated.

      (3) It is scientifically inappropriate to generalize findings to "autoimmune diseases" when only T1D and CeD were analyzed. Moreover, given that T1D and CeD show opposite directions of sex bias, combining them into a single "AID" category is misleading. All analyses presented in Figure 8 and Supplementary Figure 16 should be repeated and shown separately for T1D and CeD, rather than combined.

      We acknowledge that currently available antigen-annotated TCR databases remain limited. This reflects the considerable experimental difficulty of defining TCRs’ antigen specificities and is a widely recognized limitation in the field.

      In the curated database used here, the autoimmune-associated entries correspond primarily to type 1 diabetes (T1D) and celiac disease (CeD), two autoimmune contexts for which antigen-specific TCRs have been experimentally characterized. However, focusing on the number of antigens alone does not accurately reflect the breadth of the dataset.

      Specifically, our analysis is based on 48 epitopes and nearly 200 annotated TRB sequences, providing substantially broader antigenic representation than suggested by antigen count alone.

      Author response table 1.

      Importantly, our analytical framework does not attempt to interpret each epitope specificity individually. Instead, we examine whether TCRs annotated as autoimmune-associated are differentially represented between sexes at the level of thymic selection.

      In our dataset we observe a stronger CD8⁺ thymic selection of TCRs annotated as autoimmune- associated in females. We interpret this as evidence that central tolerance mechanisms may contribute to sex-dependent differences in autoreactive repertoire composition, rather than as a determinant of any specific autoimmune disease pathophysiology.

      (4) The McPAS database contains TCRs associated with other autoimmune diseases (e.g., multiple sclerosis, rheumatoid arthritis), although the exact autoantigens in these contexts are unknown. Why didn't the authors perform the search for such TCRs? I believe disease association even without particular known antigen could still be insightful.

      For multiple sclerosis, the only antigen present in the database is myelin basic protein (MBP). In our thymic repertoire dataset, we could not detect any CDR3 sequence matching MPB annotated CDR3s from the database.

      For rheumatoid arthritis, the database contains only a small number of TRA sequences without corresponding TRB chains. Because our specificity analysis is based on TRBs, these entries could not be used in our analyses.

      (5) Misuse of the concept of polyspecificity. I appreciate the authors' reference to Don Mason's work; however, the concept of polyspecificity discussed there is fundamentally different from the authors' usage. Mason, Sewell (doi:10.1074/jbc.M111.289488), Garcia(doi:10.1016/j.cell.2014.03.047), and others demonstrated that individual TCRs can recognize multiple peptides, possibly around 1 million. But importantly these peptides are not random but share some sequence motif. This is a general feature of TCRs, i.e. 100% of TCRs are polyspecific in this sense.

      In contrast, the authors define polyspecificity as TRB sequences annotated as specific to unrelated epitopes in TCR databases such as VDJdb. These databases are well known to contain substantial numbers of false-positive annotations (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract). The authors acknowledge that, under their definition, polyspecificity has been experimentally validated for only one (!) TCR (Quiniou et al.). In the absence of robust experimental validation, use of the term "polyspecificity" in this context is misleading. I strongly recommend removing all analyses and conclusions related to polyspecificity from the manuscript unless supported by independent functional validation.

      We agree with the reviewer that the concept of TCR polyspecificity is complex, controversial and not uniformly defined in the literature.

      For some, polyspecificity refers to the ability of individual TCRs to recognize multiple related peptides sharing structural motifs, as described by Mason, Sewell, Garcia, and others. With this definition, we agree that many/most TCRs exhibit some degree of cross-reactivity and would thus be defined as polyspecific.

      In contrast, our definition of polyspecificity came from our observation arising from large-scale repertoire analyses that certain CDR3 sequences are repeatedly annotated across databases as recognizing distinct and unrelated antigenic categories. In our previous study (Quiniou et al.), we showed that these sequences display specific biochemical and repertoire features and may represent a particular class of TCRs involved in early or heterologous immune responses. A classic cross reactivity based on structural motif sharing could not explain these results.

      We believe that the existence of such TCRs, rather than classic cross-reactive TCRs, has the potential to better explain why patients with extremely reduced TCR repertoires (around 3000 TCRs only) can respond well to various infectious challenges (https://doi.org/10.1073/pnas.97.1.274) or why there are T cells with memory phenotypes against viruses not previously encountered (https://pmc.ncbi.nlm.nih.gov/articles/PMC3626102/ ). We acknowledge that direct experimental validation of the function of such TCRs is currently limited; further work will help clarify the notion of polyspecificity, and hopefully to better understand the overlooked “heterologous immunity”.

      Of note, a recent paper in Nature Machine Intelligence (https://doi.org/10.1038/s42256-02501096-6) described the in-silico generation of antigen-specific TCRs. Using our definition of polyspecificity (TCRs with higher generation probabilities, specific V/J gene preferences, shared CDR3s across individuals, and reactivity to multiple unrelated peptides), they showed that “multitask models preferentially sample polyspecific CDR3β sequences”. Therefore, we consider the debate on polyspecificity to be ongoing, and our discussion of polyspecificity in this paper to be part of this debate.

      (6) I agree that comparing specificity enrichment between sexes is meaningful. However, enrichment relative to the database composition itself is not biologically interpretable, as acknowledged by the authors in their response. I therefore recommend removing Supplementary Figure 15, which is potentially misleading.

      In the original manuscript, the comparison to the pooled database was intended as a descriptive assessment rather than as a biological enrichment analysis. Differences between an experimental thymic repertoire and a curated reference database are expected, given the structure and annotation biases inherent to the reference resource.

      The purpose of Supplementary Figures 15B and 15C was therefore twofold: (i) to provide a descriptive overview of how specificity categories are distributed in our thymic dataset relative to the curated database, and (ii) to evaluate whether deviations from database proportions were of similar magnitude in males and females, ensuring that database composition did not differentially bias one sex over the other. In addition, the donor-resolved representations demonstrate that these patterns are consistent across individuals and are not driven by a single donor.

      To avoid any potential misinterpretation, we have revised the manuscript to remove references to “enrichment” relative to database composition and eliminated quantitative comparisons to baseline database frequencies. The corresponding text and figure legends have been clarified to indicate that these analyses are descriptive and methodological in nature, while all biological interpretations rely exclusively on direct sex-specific comparisons within the thymic dataset.

      (7) In contrast, Supplementary Figure 16 represents the most convincing result of the study (keeping in mind that the AID group should be splitted to T1D and CeD with T1D and that T1D and CeD have opposing directions of sex biases) and should be shown as a main figure, replacing Figure 8A-B which is less convincing as it doesn't show per-donor distribution.

      (8) The authors argue that applying mixed-effects modeling to Rényi entropy would require assuming a common sex effect across subsets. I do not find this assumption unreasonable. For example, if sex effects are mediated through AIRE-dependent negative selection, one would indeed expect a consistent direction of effect across subsets. The lack of statistical significance in Figure 3 may reflect limited sample size rather than true absence of the difference. Moreover, the title's phrasing "comparable TCR repertoire diversity" is vague: what is the statistical definition of "comparable"?

      The use of “comparable” in comparing TCR repertoire diversity is indeed “soft”, and aimed to indicate that there are no obvious dissimilarities.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Available HLA typing data for selected donors should be included as a table in the manuscript.

      The available low-resolution HLA typing data for the donors included in this study have been compiled and added as Supplementary Table 1 in the revised manuscript.

      (2) The authors' explanation for why external validation of gene usage biases was not possible should be concisely incorporated into the Discussion.

      We have incorporated a concise explanation in the Discussion clarifying why independent validation of the TRBV6-5 bias in external thymic datasets is currently not feasible, due to the absence of publicly available cohorts combining sorted thymic subsets, balanced sex representation, and sufficient sequencing depth.

      (3) The clarification that considered sex-specific motifs are public should be included explicitly in the main text, not only figure legend and methods.

      We now explicitly state in the main Results section that only public motifs, defined as motifs containing CDR3 sequences shared by at least two individuals, were retained in the analysis.

      (4) The statement "Thymocytes expressing TCRs with insufficient or excessive avidity are eliminated (negative selection)" is strictly speaking incorrect. Thymocytes with insufficient avidity are eliminated by death by neglect during positive selection.

      We thank the reviewer for pointing out this imprecision. The statement has been corrected.

      (5) Figure 8C is unclear - what does "80% of unique polyspecific TCRs" mean? In any case, I strongly recommend removal of all polyspecificity-related analyses.

      We apologize for the lack of clarity in the axis label of Figure 8C. To clarify, this analysis represents the proportion of polyspecific CDR3aa sequences among all sequences with an assigned specificity within an individual’s repertoire. Specifically, it measures how many unique TCR sequences, previously identified as having a known specificity in reference databases, are also categorized as polyspecific.

      To address the reviewer’s concern, we have updated the Y-axis label of Figure 8C to: "Proportion of polyspecific CDR3aa among antigen-specific sequences (%)".

      (6) "However, no significant sex-based differences were found in the usage of hydrophobic, hydrophilic, or neutral aa at the critical p109 and p110 positions in TRB" - this Discussion statement is inconsistent with the new analysis on Fig. 4C.

      We regret that the Discussion still contained wording from a previous version of the analysis. The text has now been corrected to reflect the updated results showing a significant increase in hydrophobic amino acid usage at positions p109/p110.

      (7) In the Discussion the authors write: "the absence of age-related clustering in repertoire features (data not shown)". What is the reasoning for not showing the data?

      We understand the reviewer's point. This exploratory clustering analysis was performed on the data presented in the heatmaps (Figure 2B and Supplemental Figures 10-13). However, as it revealed no distinct patterns or clustering based on the donors' age (with samples from different age groups being interspersed throughout the clusters), we chose not to add an extra layer of annotation to Figure 2B to maintain clarity.

    1. eLife Assessment

      Combining state-of-art in-situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing, this fundamental work substantially advances our understanding of glial contributions to organismal lifespan. The evidence supporting the conclusions is compelling. The work will be of broad interest to researchers studying aging biology, glia-neuron communication and in vivo proteomic profiling.

    2. Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased the cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins. 3

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many project.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C,D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform while flySAM likely would likely express all isoforms. Could this also contribute to the phenotypes observed?

      b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018) likely due to toxicity of such high levels of overexpression. Is it possible that larger increase in lifespan is due to the already reduced viability of these flies?

      c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, some many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the central adult brain, which does not include the nerve cord where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors including learning, circadian rhythms, etc.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      Comments on revisions:

      The authors have conducted additional experiments, updated text/figures, and included discussions to address the concerns raised by the reviewers. I commend the authors on a thorough, rigorous study that will undoubtedly impact the field and spawn many projects for years to come.

      One minor comment: In Figure S2, the figure legend states "A-C"; however, the figure itself only has an A and B.

    3. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that are implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day-old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins.

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many projects.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      (a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C, D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform, while flySAM would likely express all isoforms. Could this also contribute to the phenotypes observed?

      We agree with the reviewer that both can contribute to the different lifespan effect. In the original paper presenting flySAM1.0 and flySAM 2.0 (Jia et al., 2018), the authors first tested how flySAM1.0 overexpression (OE) phenotypes compare to several VPR (CRISPRa) and UAS:cDNA OE lines. They found that flySAM1.0 reliably outperforms (i.e., produces stronger OE phenotypes) than VPR in most cases, and produces OE phenotypes that are comparable (i.e., generally equivalent) to UAS:cDNA (Jia et al., 2018). After determining how flySAM1.0 performance compares to VPR and UAS:cDNA, the authors next tested if flySAM2.0 also outperforms VPR; they found that like flySAM1.0, flySAM2.0 outperforms VPR in most cases (Jia et al., 2018). In general, the data suggest that we should expect comparable overexpression phenotypes for our flySAM2.0 and UAS:cDNA lines.

      We chose to proceed with the DIP-β flySAM line for the climbing assays and snRNA-seq, as it gave a stronger lifespan effect and we thought it was likely to be the more robust OE line. While our glial cell-surface proteomics initially identified DIP-β isoform C as the candidate, it is possible that other DIP-β isoforms were also present (such as isoform F, which is identical in polypeptide sequence to isoform C) (FlyBase). Ultimately, we believe that the larger increases in lifespan observed for DIP-β flySAM are likely because flySAM targets all isoforms, whereas UAS:cDNA lines target only one isoform. Importantly, our UAS- DIP-β line was specific to DIP-β isoform C, which is the same isoform that was identified by our proteomics.

      We have made clarifications in the manuscript to address these comments.

      (b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018), likely due to the toxicity of such high levels of overexpression. Is it possible that a larger increase in lifespan is due to the already reduced viability of these flies?

      This is a good point. The flySAM lines do exhibit a shorter baseline lifespan compared to the traditional UAS lines. This is likely due to the specific genetic background of the flySAM transgenic insertions, or a low level of "leaky" expression, as previously noted in the literature (Jia et al., 2018).

      However, we believe that the lifespan extensions we observed for DIP-β flySAM is a robust biological effect, rather than an artifact of reduced viability for the following reasons. First, by utilizing the GeneSwitch (GS) system, we can compare the lifespan of flies with the exact same genetic background (+/- RU-486). This ensures that the extension we report is specifically due to the induction of the transgene, rather than a comparison between disparate lines with different basal fitness levels. Second, if the lifespan extensions merely represented a recovery from lower baseline viability, we would expect to see similar improvements across other flySAM lines in our screen. However, DIP-β was the only candidate across our screen that significantly increased lifespan in both sexes (Extended Data Figs. 7 & 8). Third, the lifespan-extending effect of DIP-β was independently confirmed using a traditional UAS-cDNA line, which importantly does not share the same baseline viability issues as the flySAM lines.

      (c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      We have updated the figure legends for Figure 2 to include the missing statistical details and sample sizes.

      Specifically, for Fig. 2A: The reviewer is correct that with only two replicates of each time point (5d vs. 50d) in the initial proteomic screen, traditional p-value calculations lack the necessary power for meaningful interpretation. We have revised the legend to clarify that this panel represents a discovery-based screen. Candidates were selected based on biological relevance and specific enrichment thresholds to narrow the 872 proteins down to the 48 top candidates for screening (we were initially aiming to identify approximately 50 candidate genes for screening). For Fig. 2B: We have updated the legend to detail the parameters used for the Gene Ontology (GO) enrichment analysis.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      We thank the reviewer for this careful observation regarding the expression pattern of the GSG3285-1 line and acknowledge that the overlap between this driver and the Repo-positive cells is not absolute.

      Our selection of this specific GeneSwitch line was based on several critical experimental considerations: 1) To minimize background toxicity. We initially tested multiple Repo-GeneSwitch lines; however, we found they exhibited significant, genotype-dependent lifespan reductions upon RU486 administration, even in control crosses. This baseline toxicity confounded the interpretation of any potential lifespan effects. GSG3285-1 was chosen for this study, as it provided a robust control baseline and didn’t show lifespan effects with RU486 treatment in multiple control lines. This is essential for lifespan studies. 2) The driver breadth and specificity. As noted in its original characterization (Nicholson et al., 2008) and a later study (Catterson et al. 2023), GSG3285-1 is characterized as a pan-glial driver, though it may include a small population of sensory neurons. Furthermore, while Repo is a standard glial marker, its antibody does not label all glial subtypes with equal intensity. The "non-overlapping" signal observed in Figure 3A may reflect this staining bias. 3) The expression mosaicism. The fact that some glial cells do not show GFP expression suggests a degree of mosaicism, which is common to many GeneSwitch lines (Osterwalder et al., 2001). While we acknowledge this means our manipulations may target a broader subset — rather than every single glial cell — the fact that we still observed significant lifespan effects across two independent platforms (UAS and CRISPRa) suggests that the targeted population is sufficient to mediate these systemic effects.

      We have added a clarifying statement to contextualize the choice of the GSG3285-1 driver and its relationship to the Repo population.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      (a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      We agree that the sex-specific effects observed in our lifespan screen are one interesting aspect of this study. We have added a dedicated section to the Discussion exploring these differences from both a technical and biological perspective.

      On the technical side, the GeneSwitch inducer, RU486, can have sex-specific effects on metabolism and lifespan, depending on the nutritional environment (Dos Santos & Cocheme, 2024). Specifically, RU486 has been shown to counteract the lifespan-shortening effects of mating in females, an effect that is less pronounced in males (Landis et al., 2015; Tower et al., 2017). While we optimized our media and used the GSG3285-1 line to minimize these baseline effects, it remains possible that certain genotypes exhibited a sex-specific sensitivity to the inducer itself. Beyond the technical considerations, sex differences in aging are well-documented in Drosophila and other organisms (Regan et al., 2016; Austad & Fischer, 2016). Male and female flies exhibit distinct transcriptional trajectories and metabolic shifts as they age. Furthermore, recent studies have highlighted that glial function and the neuroinflammatory landscape can differ significantly between sexes, which may dictate how a specific genetic manipulation impacts the aging process in a sex-dependent manner (PMID: 40951920). While our screen identifies DIP-β as a rare candidate that extends lifespan in both sexes, the prevalence of female-specific hits in our data suggests that the female "aging program" may be more plastic or responsive to the specific glial pathways we targeted. These observations provide a valuable foundation for future studies into the mechanisms of sex-specific neuroprotection.

      (b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      It is a mixture of half male and half female flies. This information has been added to the main text, Fig. 1, and to the methods section.

      (c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      Agreed, this would be a great idea for future studies.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the adult brain, which does not include the nerve cord, where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors, including learning, circadian rhythms, etc.

      We thank the reviewer for this insightful point. While our initial proteomic screen focused on the adult central brain, our behavioral validation used a pan-glial driver, which targets glia throughout the entire nervous system, including the ventral nerve cord (VNC). We have addressed the reviewer's comment as below:

      Additional behavioral data: As suggested, we performed Drosophila Activity Monitoring (DAM) assays to evaluate circadian locomotor rhythms in 50-day-old DIP-β overexpression flies compared to negative controls. Interestingly, we did not detect significant changes in circadian activity at this time point.

      The difference between our climbing and circadian results highlights the complexity of age-related decline. In Drosophila, locomotor performance (i.e., climbing) and circadian coordination often decouple. For example, specific isoforms of human Tau (hTau) can induce severe cognitive and neurodegenerative deficits without affecting lifespan or motor coordination in the same manner (Sealey et al., 2017). Furthermore, motor-specific defects can emerge independently of systemic lifespan changes, as seen in certain SOD1 models of ALS (Hirth, 2010). It is possible that the 50-day timepoint represents a specific window where motor coordination is improved by DIP-β, while circadian circuits — governed by distinct glial-neuronal interactions — remain largely unaffected, or require a different temporal window for observation.

      We agree that identifying the specific glial populations (central brain vs VNC) responsible for the improved climbing would be highly informative. While the current study establishes the pro-longevity effect of DIP-β, future work utilizing in-situ proteomics on the fully intact CNS (including the VNC) or specific VNC will be essential to map the stereotyped progression of these effects across the peripheral and central nervous systems.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      We agree that the observed changes likely represent a combination of direct cell-cell interactions and a broader, more indirect maintenance of a "younger" physiological state.

      Direct: Among the DIP family, DIP-β exhibits some of the strongest and most promiscuous binding affinities, interacting with a wide array of partners including Dpr6, 8, 9, 15, and 21 (Cosmanescu et al., 2018; Sergeeva et al., 2020). This biochemical flexibility allows DIP-β to potentially interface with a much broader range of neuronal subtypes than other DIP family members, such as DIP-δ, which exclusively binds Dpr12 and did not extend lifespan in our screen. It is possible that by overexpressing DIP-β, we may be partially compensating for the global downregulation of CAMs that typically occurs during aging, thereby preserving essential glial-neuronal communication integrity.

      Indirect: By maintaining these primary glial functions and communication activities, DIP-β overexpression likely delays the overall "aging" of the brain. This preservation of neural health can have downstream effects on systemic physiology, such as the improved glia-fat body communication we observed in 50-day-old flies. In this model, the broad transcriptomic shifts are not necessarily all direct targets of DIP-β, but rather a signature of a brain that has successfully avoided the catastrophic breakdown of homeostasis typically seen in aged wild-type flies.

      We have expanded the Discussion to clarify this distinction, adding that DIP-β likely acts as a "scaffold" or “bridge” for maintaining a younger brain state, which in turn preserves multi-organ communication.

      Reviewer #2 (Public review):

      This manuscript presents an ambitious and technically innovative study that combines in situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing to uncover glial factors that influence aging in Drosophila. The authors identify DIP-β as a glial protein whose overexpression extends lifespan and report intriguing sex-specific differences in lifespan outcomes. Overall, the study is conceptually compelling and offers a valuable dataset that will be of considerable interest to researchers studying glia-neuron communication, aging biology, and proteomic profiling in vivo.

      The in-situ proteomic labeling approach represents a notable methodological advance. If validated more extensively, it has the potential to become a widely used resource for probing glial aging mechanisms. The use of an inducible glial GeneSwitch driver is another strength, enabling the authors to carefully separate aging-relevant effects from developmental confounds. These technical choices meaningfully elevate the rigor of the study and support its central conclusions. The discovery of new candidate genes from the proteomics pipeline, including DIP-β, is intriguing and opens new avenues for understanding glial contributions to organismal lifespan. The observation of sex-specific lifespan effects is particularly interesting and warrants further exploration; the study sets the stage for future work in this direction.

      At the same time, several areas would benefit from clarification or additional analysis to fully support the manuscript's claims:

      (1) The manuscript frequently refers to "improved" or "increased" cell-cell communication following DIP-β overexpression, but the meaning of this term remains somewhat vague. Because the current analysis relies largely on transcriptomic predictions, it would be helpful to define precisely what metric is being used, e.g., increased numbers of predicted ligand-receptor interactions, enrichment of specific signaling pathways, or altered expression of communication-related components. Strengthening the mechanistic link between DIP-β, cell-cell communication, and lifespan extension, potentially through targeted validation of specific glial interactions, would substantially reinforce the interpretation.

      We agree that a more precise description of “improved” or “increased” cell-cell communication is necessary.

      Our conclusion that DIP-β overexpression is associated with “increased” cell-cell communication is based on the quantification of our CCC scores, which was performed using FlyPhoneDB2, a computational tool used to estimate cell-cell signaling from single-cell RNA-sequencing data (Liu et al., 2021; Qadiri et al., 2025). To infer cell-cell signaling, FlyPhoneDB2 and its predecessor, FlyPhoneDB, calculate “interaction scores,” comparing the expression levels of a curated list of ligand-receptor pairs between cell types (Liu et al., 2021; Qadiri et al., 2025). For example, if we detect a ligand in cell type A and its receptor in cell type B in DIP-β overexpression flies but didn’t detect both ligand and receptor in control flies, the CCC score is increased by 1. FlyPhoneDB2 additionally enables users to estimate signaling activity by also taking into consideration the expression of downstream reporter genes (Qadiri et al., 2025).

      “Improved cell-cell communication” is our interpretation based on the CCC analysis. It is important to note that the metric being used here (increased CCCs) is the number of predicted ligand-receptor interactions, and that our CCC analysis was based entirely on inferences from snRNA-seq data. We have added further clarification to our manuscript, which now further expands on the results of our CCC analysis (i.e., the increased expression for 61% and decreased expression for 39% of ligand-receptor pairs we observed in our DIP-β overexpression group, compared to our negative control), which ultimately led us to conclude that DIP-β overexpression is associated with improved cell-cell communication.

      (2) The lifespan screen is central to the paper, and clearer visualization and contextualization of these results would significantly improve the manuscript's impact. For example, Figure 3D is challenging to interpret in its current form. More explicit presentation of which manipulations extend lifespan in each sex, along with effect sizes and significance values, would provide clarity. Including positive controls for lifespan extension would also help contextualize the magnitude of the observed effects. The reported effects of DIP-β, while promising, are modest relative to baseline effects of RU feeding, and a discussion of this would help appropriately calibrate the conclusions.

      We appreciate the reviewer’s suggestion to improve the clarity of the lifespan screen results. We have significantly revised Figures 3D, 3E, and 3F to provide a more intuitive summary of the candidate gene manipulations. Figures 3D and 3E now explicitly include the effect sizes and p-values for each candidate gene, broken down by sex. We also added a new Figure 3G with a visual layout that has been streamlined to allow for quick identification of manipulations that successfully extended lifespan.

      The reviewer raises an important point regarding the use of positive controls to calibrate the magnitude of lifespan extension. We carefully considered adding a standard control (such as Rapamycin treatment); however, we opted against it for several methodological reasons:

      As noted in the literature, the magnitude of lifespan extension from standard controls can vary drastically depending on genetic background and lab environment. For instance, Rapamycin-induced extension ranges from ~10% (Schinaman et al., 2019), to over 80% (Landis et al., 2024). We felt that adding a single positive control might provide a false sense of "calibration" rather than a true universal benchmark.

      To ensure the robustness of our findings, we instead employed a dual-validation strategy. We confirmed the lifespan-extending effects of our candidates using both traditional UAS:cDNA and CRISPR-based overexpression. The fact that two independent genetic systems yielded consistent results provides strong internal evidence for the reported effects.

      We acknowledge that the effects of DIP-β are modest when compared to the baseline impact of RU486 feeding. We have added a section to the Discussion addressing this. While the effects are subtle, their reproducibility across different overexpression platforms suggests they are biologically relevant, even if they do not reach the dramatic shifts seen in some caloric restriction or drug-based models.

      We have further addressed this in the results section.

      (3) Several figures would benefit from improved labeling or more detailed legends. For instance, the meaning of "N" and "C" in Figure 1D is unclear; Figure 3A should clarify that Repo is a glial marker; and Figure 5C appears to have truncated labels. Reordering certain panels (e.g., moving control data in Figure 4A-B) may also improve narrative flow. These refinements would greatly aid reader comprehension.

      We have modified and improved the labeling of these figures to increase the clarity. For Fig. 1D, we added the explanation to the Figure legends. In brief, in the Tandem Mass Tag (TMT) isobaric labeling system, 128N is one of many channels (126, 127N, 127C, 128N, 128C, etc.) used to index and compare up to 18 samples simultaneously, improving throughput and reducing missing values.

      Fig. 3A has been updated to clarify that Repo is the glial marker. Fig. 4A-D have been reordered so that the DIP- β lifespan results are presented before the control lifespan, which hopefully improves the narrative flow of this figure. The Fig. 4 references in the manuscript have also been updated to match these changes. Additionally, Fig. 5C has been updated to include the truncated x-axis and y-axis labels.

      (4) A few claims would be strengthened by more specific references or acknowledgment of alternative interpretations. Examples include the phenoxy-radical labeling radius, the impact of H₂O₂ exposure, and the specificity of neutravidin. Additionally, downregulation of synapse-related GO terms may reflect age-related transcriptional changes rather than impaired glia-neuron communication per se, and this possibility should be recognized. The term "unbiased" to describe the screen may also be reconsidered, given the preselection of candidate genes.

      These are good suggestions. We have added references for the phenoxy-radical labeling radius (Durojaye, 2021), the impact of H₂O₂ exposure (J. Li et al., 2021), and the binding specificity of neutravidin (J. Li et al., 2021). We have also removed the term “unbiased” from our manuscript.

      Regarding the request to further address the downregulation of synapse-related GO terms, we believe this indicates a lack of clarity on our part. We did not intend to suggest that our GO analyses, which were based on our proteomics data, were necessarily indicative of impaired neuron-glia communication. Our conclusions regarding altered neuron-glia communication have come from our later snRNA-seq data and analyses. Inspired by this comment, we agree that our differential gene analysis may reflect transcriptional changes rather than impaired glia-neuron communication. We have added such alternative interpretation.

      (5) Clarifying the rationale for focusing on central brain glia over optic-lobe glia would be useful. 

      Agreed! As the intended focus of this study was the more general changes occurring during normal brain aging, we chose to focus on the central brain for our glial cell-surface proteomics, which is responsible for most of the brain’s higher order functions, including learning and memory, signal integration, behavior, etc. As the optic lobes account for approximately half of all neurons in the adult Drosophila brain and are specialized to process visual stimuli (Robinson et al., 2025), we were concerned that including the optic lobes in our glial cell-surface proteomics could strongly bias our findings towards age-related changes in visual function, rather than the more general changes we intended to focus on. Such clarification has been added to the results section (Quantitative comparison of young and old proteomes).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 62: Can the authors expand on "several changes"?

      We have added a sentence expanding upon this in the manuscript draft.

      (2) Line 137: Can the authors provide a reference for the phenoxyl radical half-life?

      Thanks for catching this. We’ve added our reference for the phenoxyl radical half-life.

      (3) Figure 1B: The authors state that neutravidin stained glia; however, there is no glial marker (e.g., anti-Repo) in this panel.

      We acknowledge the reviewer’s point. The lack of anti-Repo staining in Figure 1B is due to the requirements of the Neutravidin-Alexa 647 detection method. Because this procedure bypasses traditional primary and secondary antibody incubation to preserve the biotin signal, co-staining with Repo was not technically feasible. Nevertheless, we utilized the Repo-GAL4 driver to express UAS-CD2-HRP; since this driver is well-documented and specific to glial cells, the Neutravidin signal serves as a functional readout of the targeted glial population.

      (4) Line 254: There is no Figure 2D.

      We’ve corrected this to Fig. 2C.

      (5) Lines 390-396: No reference to the respective figures.

      We’ve made a couple corrections to reference all the respective figures.

      (6) Figure 5C: The X-axis is cut off.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Minor inconsistencies (e.g., figure references-line 254 references "Figure 2D" where none exists) should be corrected.

      We’ve corrected this to Fig. 2C.

    1. eLife Assessment

      This study presents a valuable finding on how the locus coeruleus modulates the involvement of medial prefrontal cortex in set shifting using calcium imaging in mice. The evidence supporting the claims was viewed as solid in revealing the dynamics and potential mechanisms supporting extradimensional shifts. The work is of broad interest to those studying flexible cognition.

    2. Reviewer #3 (Public review):

      Summary:

      Nigro et al examine how the locus coeruleus (LC) influences the medial prefrontal cortex (mPFC) during attentional shifts required for behavioral flexibility. Specifically, they propose that LC-mPFC inputs enable mice to shift attention effectively from texture to odor cues to optimize behavior. The LC and its noradrenergic projections to the mPFC have previously been implicated in this behavior. The authors further establish this by using chemogenetics to inhibit LC terminals in mPFC and show a selective deficit in extradimensional set shifting behavior. But the study's primary innovation is the simultaneous inhibition of LC while recording multineuron patterns of activity in mPFC. Analysis at the single neuron and population levels revealed broadened tuning properties, less distinct population dynamics, and disrupted predictive encoding when LC is inhibited. These findings add to our understanding of how neuromodulatory inputs shape attentional encoding in mPFC and are an important advance. There are some methodological limitations and/or caveats that should be considered when interpreting the findings and these are described below.

      Strengths:

      The naturalistic set-shifting task in freely-moving animals is a major strength, and the inclusion of localized suppression of LC-mPFC terminals builds confidence in the specificity of the behavioral effect. Combining chemogenetic inhibition of LC while simultaneously recording neural activity in mPFC with miniscopes is state-of-the-art. The authors apply analyses to population dynamics, in particular, that can advance our understanding of how the LC modifies patterns of mPFC neural activity. The authors show that neural encoding at both the single cell level and the population level are disrupted when LC is inhibited. They also show that activity is less able to predict key aspects of the behavior when the influence of LC is disrupted. This is quite interesting and adds to a growing understanding of how neuromodulatory systems sharpen tuning of mPFC activity.

      Weaknesses:

      Weaknesses are mostly minor, but there are some caveats that should be considered. First, the authors use a DBH-Cre mouse line and provide histological confirmation of overlap between HM4Di expression and TH immunostaining. While this strongly suggests modulation of noradrenergic circuit activity, the results should be interpreted conservatively as there is no independent confirmation that norepinephrine (NE) release is suppressed and these neurons are known to release other neurotransmitters and signaling peptides. In the absence of additional control experiments, it is important to recognize that effects on mPFC activity may or may not be directly due to LC-mPFC NE.

      Another caveat is that the imaging analyses are entirely from the extradimensional shift session. Without analyzing activity data from the intradimensional shift (IDS) session, one cannot be certain that the observed changes are to some feature of activity that is specific to extradimensional shifts. Future experiments should examine animals with LC suppression during the IDS as well, which would show whether the observed effects are specific to an extradimensional shift and might explain behavioral effects.

      Comments on revisions:

      The authors overall do a nice job of addressing reviewer comments, and I believe the manuscript is significantly improved.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      We thank the reviewers and editors for the second round of peer review. Following the editorial assessment and specific review comments, we now present new results to compare EDS and IDS behavior, and use conventional standard for reporting statistics. We also request to simplify the manuscript title to be ‘Locus coeruleus modulation of prefrontal dynamics during attentional switching in mice’.

      Public Reviews:

      Reviewer #1 (Public review):

      In their response to reviewers, the authors say "We report p values using 2 decimal points and standard language as suggested by this reviewer". However, no changes were made in the manuscript: for example, "P = 4.2e-3" rather than "p = 0.004".

      We apologize for this misunderstanding. We initially interpreted this comment as reporting two non-zero digits in p values. We now have corrected this in the revision. We also follow the editorial recommendation and use a standard convention to report statistics (e.g., p = 0.03, t(7) = -2.8).

      In their response to the reviewers, they wrote: "Upon closer examination of the behavioral data, we exclude several sessions where more trials were taken in IDS than in EDS." If those sessions in which EDSIDS. Most problematic is the fact that the manuscript now reads "Importantly, control mice (pooled from Fig. 1e, 1h, Supp. Fig. 1a, 1b) took more trials to complete EDS than IDS (Trials to criterion: IDS vs. EDS, 10 {plus minus} 1 trials vs. 16 {plus minus} 1 trials, P < 1e-3, Supp. Fig. 1c), further supporting the validity of attentional switching (as in Fig. 1c)" without mentioning that data has been excluded.

      Editor raised a similar concern. We apologize for this oversight, which was due to miscommunication within the lab. We have now revised the manuscript to include all data points without any exclusion in Fig. 1e, 1h, and Supp. Fig. 1a-c. By pooling all data without any exclusion, control mice readily took more trials to complete EDS than IDS, supporting the validity of attentional switching (Trials to criterion: IDS vs. EDS, 11 ± 1 trials vs. 15 ± 1 trials, p = 0.006, Supp. Fig. 1c).

      The exclusion we initially meant to perform was to exclude sessions where task performance in IDS was beyond 95% threshold inferred from the naïve control group (15 trials, Fig. 1c). Exclusions are now explicitly described. Of note, including or excluding these sessions does not change any of the conclusions presented in our manuscript. We have added this analysis in Supp. Fig. 1d and the results remain robust (Supp. Fig. 1d). This panel could be removed if deemed unnecessary by the reviewers.

      Reviewer #3 (Public review):

      The authors overall do a nice job of addressing reviewer comments, and I believe the manuscript is significantly improved. Congratulations!

      We thank you for this positive assessment.

      Weaknesses are mostly minor, but there are some caveats that should be considered. First, the authors use a DBH-Cre mouse line and provide histological confirmation of overlap between HM4Di expression and TH immunostaining. While this strongly suggests modulation of noradrenergic circuit activity, the results should be interpreted conservatively as there is no independent confirmation that norepinephrine (NE) release is suppressed and these neurons are known to release other neurotransmitters and signaling peptides. In the absence of additional control experiments, it is important to recognize that effects on mPFC activity may or may not be directly due to LC-mPFC NE.

      We agree with this comment, and now further discuss this limitation in Discussion, line 255-259:

      “However, it is important to note that LC-NE neurons can co-release other neurotransmitters, such as dopamine and neuropeptides[73,75,76]. In the absence of further control experiments to confirm the suppression of NE release, the observed effects on mPFC may or may not be directly due to NE. Future studies are needed to better delineate the involvement of specific neurotransmitters, cell types and receptors in flexible decision making.”

      Another caveat is that the imaging analyses are entirely from the extradimensional shift session. Without analyzing activity data from the intradimensional shift (IDS) session, one cannot be certain that the observed changes are to some feature of activity that is specific to extradimensional shifts. Future experiments should examine animals with LC suppression during the IDS as well, which would show whether the observed effects are specific to an extradimensional shift and might explain behavioral effects.

      We also agree with this comment, and have thought about this. Technically, IDS has low trial numbers, especially incorrect trials, limiting the power of statistical comparisons. Conceptually, since in our paradigm EDS is always the last stage, comparing neural signals in EDS with previous stages may be confounded by the order of learning. That is, whether the observed differences in mPFC activity were due to mPFC responding to different rules, or due to mPFC responses over time/learning. We now discuss this point in Discussion, line 291-295:

      “Another limitation in the current study is that neurophysiological analyses were entirely from EDS. Without comparing with other task stages (e.g., REV, IDS), it is uncertain to what extent the observed neuronal changes are specific to EDS. Future experiments should examine the behavioral and neurophysiological effects with LC inhibition to determine the specificity of LC-NE modulation of the mPFC during attentional switching.”

      We are also actively collecting additional data to address this point, which requires considerable efforts. We hope to report our findings in a follow up study.

    1. eLife Assessment

      The new development of Neuroplex, a pipeline that links projection-defined neuronal identity to in vivo calcium activity within the same animal, is an important contribution to the field of neuroscience and beyond. The strength of evidence is convincing.

    2. Reviewer #1 (Public review):

      Genetically encoded fluorescent proteins expressed in specific cell types allow recognising them in vivo and, if the protein is a functional indicator, as in the case of genetically encoded calcium indicators (GECIs), to record activity from the same cellular ensemble. Ideally, if proteins (fluorophores) have perfectly distinct spectral properties, signals can be distinguished from as many cell types as the number of employed fluorophores. In practice, fluorescent proteins have non-negligible crosstalk both in absorption and emission bands. In addition, fluorescence contribution of each fluorophore normally varies from cell to cell and therefore spectral properties of cells expressing two or more proteins are different. The work of Phillips et al. addresses this challenge. The authors present an approach defined as "Neuroplex", allowing identification of up to nine cell types from the same number of fluorophores. The fingerprint of each cell is then associated with functional fluorescence from the GECI GCaMP, allowing recording calcium activity from that specific cell. The method is implemented in vivo using head-mounted miniscopes.

      The authors used a mouse line expressing GCaMP in cortical pyramidal neurons and developed an experimental pipeline. First, they injected the nine AAV viruses, causing expression of fluorophores in a different brain area. The idea was not to image that area, but a non-infected medial prefrontal cortex (mPFC) section where neurons could be infected by their axons projecting in an injected area, in this way being identified by their targeting region(s). A GRIN lens, allowing spectral analysis, was mounted in the mPFC section, and GCaMP fluorescence was then recorded during behavioural tasks and analysed to identify regions of interest (ROIs) corresponding to neuron somata. After functional imaging, the head of the mouse was fixed, spectral analysis was performed, and after necessary correction for chromatic distortions, the fluorophore contribution was determined for each ROI (neuron) from where GCaMP signals were detected. Notably, the procedures for estimation and correction of chromatic aberration and light transmission (described in Figure 2) were a major challenge in their technical achievements. The selection of the nine fluorophores was another big effort. This was done by combining computer simulations and direct measurement of spectra from individual proteins expressed in HEK293 cells. It is important to say that the authors could simulate arbitrary combinations of two or more different fluorophores and evaluate the ability of their algorithm to detect the correct proteins against wrong estimations of false-negative (absence of an expressed protein) or false-positive (presence of a non-expressed protein). Not surprisingly, this ability decreases with the level of GCaMP expression. The authors underline that most errors were false-negatives, which have a milder impact in terms of result interpretation, but the rate of false positives was, nevertheless, relevant in detecting a second fluorophore from a cell expressing only one protein. The experimental profiles of fluorophores were dependent both on the specific fluorescent protein and on the projecting area, and the distribution of double-labelled did not match anatomical evidence. This result should be taken as the limitation of the present pioneering experiments, presented as proof-of-principle of the approach, but Neuroplex may provide far improved precision under different experimental conditions.

      In my view, the work of Phillips et al. represents a significant advance in the state-of-the-art of the field. The rigorous analysis of limitations in the use of Neuroplex must be considered an important guideline for future uses of this approach.

      Comments on revision:

      The authors have adequately addressed my comments.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript introduces Neuroplex, a pipeline that integrates miniscope Ca²⁺ imaging in freely moving mice with multiplexed confocal and spectral imaging to infer projection identities of recorded neurons. This technical approach is promising and could broaden access to projection-resolved population imaging. However, the core quantitative analyses apply a winner-take-all single-label assignment per neuron even when multiple fluorophores exceed threshold, with additional labels treated descriptively as "secondary hits." While the authors acknowledge and simulate dual labeling, the extent to which this single-label decision rule affects subtype fractions and behavioural comparisons remains uncertain without a multi-label (or probabilistic) sensitivity analysis and propagation of classification uncertainty.

      Strengths:

      (1) Conceptual advance and practicality: Decoupling acquisition from identity readout constitutes an innovative approach that is, in principle, applicable in laboratories currently using single-color miniscopes.

      (2) Engineering thoroughness: The manuscript offers detailed consideration of GRIN optics, spectral libraries, registration procedures, and simulations that address signal-to-noise ratio, background, and class imbalances.

      (3) Immediate community value: If demonstrated to be robust, the pipeline could enable projection-resolved analyses without reliance on specialized multicolor miniscopes.

      Comments on revision:

      The authors have addressed my comments, and I have no further remarks.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Genetically encoded fluorescent proteins expressed in specific cell types allow recognising them in vivo and, if the protein is a functional indicator, as in the case of genetically encoded calcium indicators (GECIs), to record activity from the same cellular ensemble. Ideally, if proteins (fluorophores) have perfectly distinct spectral properties, signals can be distinguished from as many cell types as the number of employed fluorophores. In practice, fluorescent proteins have non-negligible crosstalk both in absorption and emission bands. In addition, fluorescence contribution of each fluorophore normally varies from cell to cell and therefore spectral properties of cells expressing two or more proteins are different. The work of Phillips et al. addresses this challenge. The authors present an approach defined as "Neuroplex", allowing identification of up to nine cell types from the same number of fluorophores. The fingerprint of each cell is then associated with functional fluorescence from the GECI GCaMP, allowing recording calcium activity from that specific cell. The method is implemented in vivo using head-mounted miniscopes.

      The authors used a mouse line expressing GCaMP in cortical pyramidal neurons and developed an experimental pipeline. First, they injected the nine AAV viruses, causing expression of fluorophores in a different brain area. The idea was not to image that area, but a non-infected medial prefrontal cortex (mPFC) section where neurons could be infected by their axons projecting in an injected area, in this way being identified by their targeting region(s). A GRIN lens, allowing spectral analysis, was mounted in the mPFC section, and GCaMP fluorescence was then recorded during behavioural tasks and analysed to identify regions of interest (ROIs) corresponding to neuron somata. After functional imaging, the head of the mouse was fixed, spectral analysis was performed, and after necessary correction for chromatic distortions, the fluorophore contribution was determined for each ROI (neuron) from where GCaMP signals were detected. Notably, the procedures for estimation and correction of chromatic aberration and light transmission (described in Figure 2) were a major challenge in their technical achievements. The selection of the nine fluorophores was another big effort. This was done by combining computer simulations and direct measurement of spectra from individual proteins expressed in HEK293 cells. It is important to say that the authors could simulate arbitrary combinations of two or more different fluorophores and evaluate the ability of their algorithm to detect the correct proteins against wrong estimations of false-negative (absence of an expressed protein) or false-positive (presence of a non-expressed protein). Not surprisingly, this ability decreases with the level of GCaMP expression. The authors underline that most errors were false-negatives, which have a milder impact in terms of result interpretation, but the rate of false positives was, nevertheless, relevant in detecting a second fluorophore from a cell expressing only one protein. The experimental profiles of fluorophores were dependent both on the specific fluorescent protein and on the projecting area, and the distribution of double-labelled did not match anatomical evidence. This result should be taken as the limitation of the present pioneering experiments, presented as proof-of-principle of the approach, but Neuroplex may provide far improved precision under different experimental conditions.

      In my view, the work of Phillips et al. represents a significant advance in the state-of-the-art of the field. The rigorous analysis of limitations in the use of Neuroplex must be considered an important guideline for future uses of this approach.

      We appreciate the reviewer’s positive evaluation and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript introduces Neuroplex, a pipeline that integrates miniscope Ca²⁺ imaging in freely moving mice with multiplexed confocal and spectral imaging to infer projection identities of recorded neurons. This technical approach is promising and could broaden access to projection-resolved population imaging. However, the core quantitative analyses apply a winner-take-all single-label assignment per neuron even when multiple fluorophores exceed threshold, with additional labels treated descriptively as "secondary hits." While the authors acknowledge and simulate dual labeling, the extent to which this single-label decision rule affects subtype fractions and behavioural comparisons remains uncertain without a multi-label (or probabilistic) sensitivity analysis and propagation of classification uncertainty.

      We thank Reviewer #2 for the careful statistical perspective and focus on assignment strategy and uncertainty. Importantly, we emphasize that Neuroplex is presented as a methodological proof-of-principle, not as a definitive quantification of projection convergence.

      Strengths:

      (1) Conceptual advance and practicality: Decoupling acquisition from identity readout constitutes an innovative approach that is, in principle, applicable in laboratories currently using single-color miniscopes.

      (2) Engineering thoroughness: The manuscript offers detailed consideration of GRIN optics, spectral libraries, registration procedures, and simulations that address signal-to-noise ratio, background, and class imbalances.

      (3) Immediate community value: If demonstrated to be robust, the pipeline could enable projection-resolved analyses without reliance on specialized multicolor miniscopes.

      Weaknesses:

      (1) Single-label assignment in the main analyses: When multiple fluorophores exceed threshold for a neuron/ROI, the workflow applies a winner-take-all rule and assigns a single label (the fluorophore with the largest standardized beta), while additional above-threshold fluorophores are retained only as "secondary hits." This is a reasonable specificity-first choice, but because cortical excitatory neurons can collateralize, collapsing dual-threshold ROIs to one identity may under-represent dual-projecting cells and could bias estimated subtype fractions and behavioural comparisons.

      We thank the reviewer for raising this important conceptual point.

      We agree that cortical excitatory neurons frequently collateralize and therefore may legitimately express more than one retrograde fluorophore. Our use of a winner-take-all (WTA) rule in the primary analyses was an intentionally conservative methodological choice designed to prioritize specificity over sensitivity in this proof-of-principle study.

      As demonstrated in our simulations (Supp. Fig. 5–6), under realistic background and noise conditions, secondary assignments are more susceptible to false-positive errors than primary assignments. For this reason, we chose to assign a single primary identity for quantitative behavioral stratification while retaining additional above-threshold fluorophores as “secondary hits” and reporting their distribution separately (Supp. Fig. 7).

      We did not intend to imply that projections are exclusive. Rather, the WTA strategy provides a conservative lower-bound estimate of subtype proportions and avoids inflation of dual-label rates under conditions where spectral separability is imperfect.

      We agree that this rationale should be stated more explicitly in the manuscript, and that the potential impact of assignment strategy on subtype fractions and behavioral comparisons should be acknowledged clearly as a methodological trade-off rather than a biological claim.

      Importantly, the biological analyses presented in this manuscript are illustrative demonstrations of functional stratification capability and do not depend on exclusivity of projection identity. We have revised the manuscript to clarify this framing as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using a winner-take-all rule. We emphasize that this assignment strategy does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.” (Result, Fluorophore distribution in behaviorally relevant ROIs)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications. ” (Results, Neuronal Cell Type and Behavior)

      “Cortical pyramidal neurons frequently collateralize to multiple downstream targets, and accordingly some ROIs exceeded threshold for more than one fluorophore. In this proof-of-principle implementation, we adopted a specificity-first winner-take-all assignment rule for primary analyses to minimize false-positive multi-label calls under realistic noise conditions. This strategy likely underestimates the true prevalence of dual-projecting neurons and should therefore be interpreted as a conservative stratification approach rather than a statement of projection exclusivity.” (Discussion)

      (2) Dual-label detection is acknowledged but remains descriptive in vivo: the manuscript explicitly discusses the possibility of dual projection, evaluates dual-fluorophore detection in simulations (including performance under realistic noise/background), and reports in vivo rates of secondary hits. However, these dual-threshold events are not incorporated as co-identities in the main statistical analyses, making it difficult to judge how robust the principal biological conclusions are to the single-label decision rule.

      We thank the reviewer for this important clarification request.

      We agree that dual-projection neurons are biologically plausible and that dual-threshold ROIs were detected in vivo. In this manuscript, however, our primary goal was to establish the feasibility of high-dimensional spectral assignment and projection-resolved stratification, rather than to provide a definitive quantification of projection convergence.

      For this proof-of-principle study, we chose a conservative winner-take-all (WTA) framework for primary behavioral analyses in order to minimize false-positive multi-label assignments under realistic noise and background conditions, as demonstrated in our simulations (Supp. Fig. 5–6). Secondary hits were retained and reported descriptively (Supp. Fig. 7), but not incorporated into the primary statistical comparisons to avoid overinterpretation of potentially ambiguous dual-label calls.

      Importantly, the principal biological conclusions presented in the manuscript are qualitative demonstrations that projection-defined stratification is feasible within a single animal. These conclusions do not rely on projection exclusivity or on precise quantification of dual-projecting fractions.

      We agree that this distinction should be made clearer in the manuscript, and we have revised the text as follows:

      “Although dual-threshold ROIs were detected in vivo, these secondary assignments were not incorporated as co-identities in the primary behavioral analyses. This decision reflects a conservative specificity-first framework designed to minimize false-positive multi-label calls under realistic noise conditions. Accordingly, dual-label rates reported here should be interpreted descriptively. The present study focuses on demonstrating the feasibility of projection-resolved stratification, rather than providing definitive quantification of projection convergence.” (Results, Fluorophore distribution in behaviorally relevant ROIs)

      “We then stratified these neurons by projection target and examined behaviorally selective activity across cell types. These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Behavioral Analysis)

      (3) Uncertainty is not propagated: False-positive/false-negative rates from simulations and uncertainty from registration/segmentation are not carried forward into quantitative confidence bounds on subtype proportions or behaviour-by-subtype effects.

      We agree that formal propagation of classification and registration uncertainty into subtype proportions and behavioral comparisons would be appropriate in a study primarily focused on precise anatomical quantification. However, the central goal of the present manuscript is methodological and to demonstrate that high-dimensional spectral identity can be reliably linked to miniscope-recorded functional activity within a single animal.

      We have shown that simulations under realistic noise, background, and class imbalance conditions (Supp. Fig 5-6) show that errors are predominantly false negatives rather than false positives. However, behavioral analyses are presented as qualitative demonstrations of the feasibility of projection-resolved stratification rather than as definitive quantitative anatomical measurements.

      In the revised manuscript, we clarified that 1) subtype proportions and behavioral effects are assignment-dependent estimates, 2) simulation-derived error rates provide guidance for experimental design rather than formal confidence intervals, and 3) future studies centered on precise quantification of projection fractions would benefit from formal uncertainty modeling, as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “Because classification uncertainty was not formally propagated into these analyses, behavior-by-subtype comparisons should be interpreted as qualitative demonstrations of functional stratification rather than precise quantitative estimates.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      Reviewer #3 (Public review):

      This manuscript presents Neuroplex, a technically rigorous and carefully validated pipeline that links miniscope calcium imaging in freely behaving animals with high-dimensional fluorophore-based cell-type identification using in vivo multiplexed spectral confocal imaging through the same implanted GRIN lens. The work overcomes a major practical limitation of head-mounted microscopy by enabling the identification of up to nine projection-defined neuronal populations within the same animal, without post-fixation histology. The approach is well motivated and supported by extensive calibration and simulation. While the biological results are primarily illustrative, the methodological contribution is clear and likely to be broadly useful.

      Major comments

      (1) The approach relies on the assumption that fluorophore identity assigned during anesthetized confocal imaging accurately reflects the identity of neurons recorded during prior behavioural sessions. While the use of the same GRIN lens and in vivo co-registration mitigates many concerns, the manuscript would benefit from a more explicit discussion, or empirical demonstration, if available, of the stability of fluorophore assignments across time. Even limited repeat spectral imaging in a subset of animals would strengthen confidence in longitudinal applicability.

      We thank the reviewer for highlighting this important conceptual assumption.

      Fluorophore identity in Neuroplex is genetically encoded via AAVretro delivery and therefore does not depend on transient physiological state. Spectral imaging is performed in vivo through the same GRIN lens and field of view used during behavioral imaging, and co-registration relies on anatomical landmarks. While repeat spectral imaging was not formally performed as a longitudinal experiment, the underlying fluorescent protein expression is stable over weeks, and there is no biological mechanism in this paradigm that would alter fluorophore identity across sessions.

      We revised the manuscript to explicitly state this assumption and clarify why identity stability is expected as follows:

      “…fluorophore signals and reduce unmixing fidelity, leading to an increased false positive rate. Fluorophore identity in this framework is genetically encoded via retrograde AAV delivery and is therefore expected to remain stable across behavioral and spectral imaging sessions. Because both functional and spectral data are acquired in vivo through the same GRIN lens and co-registered using anatomical landmarks, assignment stability is not expected to vary across time unless expression levels change substantially. While repeat spectral imaging was not performed as a formal longitudinal experiment in this study, the stability of fluorescent protein expression supports the assumption that fluorophore identity reflects a persistent cellular attribute.” (Discussion)

      (2) Fluorophore identity is determined using thresholding of linear unmixing coefficients relative to an empirically defined baseline, followed by a second adaptive pass for over-represented fluorophores. While this heuristic is extensively validated via simulations, it remains ad hoc from a statistical perspective. The authors should more explicitly justify this choice and discuss its limitations relative to probabilistic or likelihood-based classifiers, particularly with respect to uncertainty estimation at the single-ROI level.

      We agree that the dual-pass thresholding approach is heuristic rather than fully probabilistic. More formal probabilistic classifiers are possible but would introduce additional modeling assumptions and training requirements beyond the scope of this proof-of-principle study.

      We revised our manuscript to clarify this as follows:

      “The current classification framework relies on linear unmixing followed by empirically defined thresholding rather than full probabilistic inference. This approach provides transparency and practical robustness under realistic noise and background conditions but does not generate single-ROI posterior uncertainty estimates. ” (Discussion)

      (3) Identifiability of fluorophores is demonstrated empirically, but the manuscript does not explicitly quantify spectral separability (e.g., similarity metrics between basis spectra or conditioning of the unmixing matrix). A brief analysis of spectral independence or sensitivity of beta estimates to noise would provide mathematical reassurance, especially given the reliance on linear regression in a high-dimensional feature space.

      We agree that spectral separability is conceptually important. In this manuscript, separability is demonstrated empirically through 1) In vitro fingerprint acquisition under identical optical conditions, 2) simulation under background and noise, and 3) successful in vivo classification across regimes. We did not compute formal matrix conditioning metrics, but we agree that the separability rationale should be described more explicitly. We revised our manuscript as:

      “While formal conditioning metrics were not explicitly computed empirical fingerprint acquisition and simulation-based perturbation analyses demonstrate sufficient spectral independence for reliable linear unmixing under the tested regimes.” (Discussion)

      (4) The spectral unmixing treats CNMF-derived ROIs as fixed supports. I wonder whether ROI boundaries, neuropil contamination, and partial overlap can introduce structured uncertainty that could bias spectral estimates. If so, the authors should acknowledge this dependency more explicitly and discuss how ROI quality or overlap might influence false negatives or false positives, particularly in densely labelled regions.

      We agree that ROI definition influences spectral extraction. Spectral fingerprints are derived by averaging all pixels within the ROI mask, and therefore neuropil contamination, partial ROI overlap, and dense labeling could influence beta estimates. In the revised manuscript, we have acknowledged this dependencies more explicitly.

      “Spectral unmixing operates on CNMF-derived ROI masks treated as fixed supports. Accordingly, segmentation quality, neuropil contamination, and partial overlap between neighboring cells can influence extracted spectral fingerprints and may contribute to false negatives or secondary assignments, particularly in densely labeled regions. These structured sources of uncertainty are expected to have the greatest impact under regimes of extreme class imbalance, low fluorophore brightness, strong neuropil signal, or pairing of spectrally overlapping reporters. Use of refined segmentation strategies or nuclear-localized reporters could reduce such structured uncertainty in future implementations.” (Discussion)

      (5) The manuscript reports meaningful rates of secondary fluorophore detection, but also nontrivial false-positive rates for secondary labels under realistic conditions. The authors appropriately caution against over-interpretation, but the Discussion should more clearly delineate when dual-label assignments are likely to be biologically interpretable versus methodologically ambiguous, and how experimental design (e.g., fluorophore pairing) should be optimized accordingly.

      We agree and will delineate interpretability boundaries explicitly.

      “Dual-label assignments are most reliable when fluorophores are spectrally well separated and when signal-to-noise ratios are high. In contrast, spectrally adjacent fluorophore pairs or densely labeled regimes increase ambiguity and false-positive risk. Experimental design should therefore prioritize pairing spectrally distant fluorophores when projection convergence is of primary interest.” (Discussion)

      (6) I suspect that Neuroplex will be most effective in certain regimes (moderate convergence, bright and spectrally distinct fluorophores) and less reliable in others. A more explicit discussion of best practices, anticipated failure modes, and experimental scenarios where the method may be inappropriate would increase the practical value of the paper for adopters.

      “More broadly, Neuroplex is expected to perform most robustly in regimes characterized by moderate projection convergence, balanced fluorophore representation, bright and spectrally distinct reporters, and adequate signal-to-noise ratio. Imaging directly within a projection target that has received dense retrograde labeling may introduce substantial class imbalance, which simulations predict will reduce detection sensitivity for the dominant fluorophore. In such cases, conservative assignment strategies, reduced spectral complexity, or refinement of ROI definition may improve interpretability. Careful fluorophore selection and pilot validation under intended imaging conditions are therefore recommended prior to large-scale application. Future implementations incorporating nuclear-localized reporters may further reduce segmentation-dependent ambiguity by constraining spectral signals to somatic compartments.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should address a few points that are not clear.

      (1) At the end of the Results, the authors assess their approach using only four fluorophores and conclude that Neuroplex works "even" under reduced complexity. There is something I am missing. In my mind, lower complexity should be easier and should work better. As a researcher, I would first assess a four-fluorophores scenario and then step up with complexity, but the authors did the opposite. Also, I think that the present Supplementary Figure 9 should be in the main text; I don't understand why the authors decided to relegate a clear result to the bottom of everything. The authors should give some explanations.

      We agree that reduced spectral complexity should, in principle, improve separability and classification performance. Our original presentation order was intended to first demonstrate feasibility under the most challenging condition (nine fluorophores plus GCaMP), thereby establishing maximal multiplexing capacity. The reduced-complexity experiment was included to demonstrate scalability and generalizability under more typical experimental regimes. However, we agree that this rationale was not sufficiently clear and that the reduced-complexity results merit presentation in the main text.

      Accordingly:

      We have moved former Supplementary Figure 9 into the main Results (Fig. 6).

      We have clarified explicitly why the nine-fluorophore condition was presented first as follows:

      “To evaluate the performance of Neuroplex under more typical experimental regimes with reduced-complexity, we applied the pipeline to two GCaMP transgenic animals injected with a subset of four fluorophores.”

      (2) The question of relative expression is crucial. Among the infected regions, there is the contralateral mPFC and I imagine that if they image there, the contribution of the expressed protein might dominate all other components, preventing detection of other fluorophores, including GCaMP. But is it the case, or would it be possible to detect projecting neurons in that region? I would be surprised that the authors never tried it; this test would simply imply mounting the GRID lens on the other hemisphere.

      This is an important conceptual point.

      Our simulations (Supp. Fig. 5) explicitly model over-representation of a single fluorophore. These results show that heavy class imbalance primarily increases false negatives (due to baseline normalization) rather than false positives.

      In the revised manuiscript, we discussed this limitation more explicitly.

      “Relative fluorophore representation within the imaged field of view influences classification robustness. As demonstrated in our simulations of class imbalance (Supp. Fig. 5g–h), extreme over-representation of a single fluorophore primarily increases false-negative rates due to baseline normalization effects. In the present study, we intentionally avoided imaging directly within heavily infected projection targets (e.g., contralateral mPFC) in order to maintain moderate fluorophore representation across ROIs. Imaging in a densely labeled region would represent a more challenging regime, and we would expect reduced sensitivity for the dominant fluorophore under such conditions.” (Dicussion)

      (3) The possibility to utilise Neuroplex goes beyond the type of experiment presented as proof-of-concept in this technical paper. In the Discussion, the authors mention genetically defined subtypes and activity-tagged neurons. But, if one changes the pipeline, can it be used by expressing GECIs with different spectra, or GECIs and genetically-encoded voltage indicators (GEVIs)? I would be very interested in knowing what the authors think about this putative "shortcut".

      We thank the reviewer for this forward-looking and insightful question.

      In principle, the Neuroplex framework could be extended to incorporate spectrally distinct genetically encoded functional indicators, including multi-color GECIs or combinations of GECIs and GEVIs. However, it is important to distinguish this from the identity-assignment strategy implemented in the present study.

      Simultaneous multi-color functional imaging under a head-mounted miniscope is optically more demanding than assigning cell identity from single-color functional recordings followed by high-dimensional spectral readout. Multi-color GECI or GEVI imaging requires real-time excitation and emission separation during dynamic recording, increases optical complexity, and is particularly sensitive to chromatic aberration, photon efficiency, and signal-to-noise constraints imposed by GRIN lenses.

      In contrast, Neuroplex decouples functional acquisition from spectral identity determination. Functional activity is recorded using a single optimized channel, while spectral separation is performed separately under controlled confocal conditions with multiplexed excitation and emission sampling. This design substantially reduces optical burden during behavioral imaging.

      While integration of multiple functional reporters is conceptually feasible within this framework, successful implementation would require careful validation of brightness, spectral separability, and temporal stability for each reporter combination.

      Reviewer #2 (Recommendations for the authors):

      (1) Implement a principled multi-label calling mode for cells with >1 above-threshold fluorophore (e.g., per-fluorophore FDR control or Bayesian posteriors). Report cell-wise weights and re-run key results three ways: single-label, hard multi-label, and soft (probabilistic) assignments; state explicitly how conclusions change.

      We appreciate this suggestion and agree that multi-label or probabilistic calling frameworks are well motivated, particularly for studies in which projection convergence is the central biological question. In the current manuscript, however, our goal is to establish a practically deployable proof-of-principle pipeline for linking miniscope functional recordings to a high-dimensional spectral-identity readout. Consistent with this scope, we used a conservative winner-take-all (WTA) strategy for primary analyses to prioritize specificity under realistic noise and background conditions, and we treated multi-hit events descriptively. Importantly, the qualitative conclusions regarding projection-resolved functional stratification are unchanged when secondary-hit distributions are examined.

      In the revised manuscript, we explicitly stated that: (i) single-label assignment is a conservative analysis choice rather than a biological claim of exclusivity, and (ii) multi-label or probabilistic calling is a natural extension for future work, as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (2) Add ground truth for dual projectors in a subset (paired orthogonal tracers or staged injections) and provide a confusion matrix including dual-positives; use this to calibrate thresholds/priors.

      We agree that ground truth validation of dual projectors using orthogonal tracers or staged injections would be valuable, particularly for calibrating priors and enabling confusion-matrix-based evaluation. However, these experiments require additional cohorts and experimental design beyond the scope of the current proof-of-principle technical manuscript. Our goal here is to demonstrate the feasibility of multiplexed identification and projection-resolved stratification within a single animal, not to provide definitive anatomical quantification of collateralization.

      We have revised the manuscript to clearly state that dual-label in vivo observations are descriptive and that studies aimed at quantitative convergence mapping should incorporate orthogonal ground truth validation.

      “Accurate quantification of projection convergence would benefit from orthogonal ground-truth validation (e.g., paired tracers or staged injections) to establish confusion matrices for dual positives and to calibrate thresholds or priors.”

      (3) Propagate uncertainty from simulations and registration/segmentation to subtype fractions and behavior effects (error bars or sensitivity analyses).

      We agree that formal uncertainty propagation is appropriate for studies focused on precisely quantifying subtype proportions or effect sizes. In this manuscript, subtype fractions and behavioral comparisons are presented primarily as demonstrations of the feasibility of projection-resolved functional stratification, rather than definitive anatomical measurements. Simulation analyses are included to characterize expected performance under defined noise and background regimes, but we did not propagate these uncertainties into downstream confidence bounds in this proof-of-principle work.

      We have revised the manuscript to clarify this explicitly as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (4) Mitigate sources of spurious multi-hits (neuropil handling, ROI mask erosion, nuclear-localized reporters, spectral basis choices) and quantify their impact on dual-label recovery.

      We agree that neuropil contamination, ROI boundary choices, and spectral basis selection can influence multi-hit rates. In the current manuscript, we already implement background subtraction and evaluate multi-hit behavior through simulations under realistic background and noise regimes. Quantitative evaluation of additional mitigation strategies (e.g., ROI erosion comparisons) would require new analyses beyond the current scope.

      We have revised the Discussion to include concrete best-practice recommendations (e.g., fluorophore pairing, conservative interpretation of multi-hits, and potential use of nuclear-localized reporters).

      “Multi-hit events can reflect true biological collateralization but may also arise from structured sources of ambiguity such as neuropil contamination, partial ROI overlap, or imperfect ROI boundaries. These factors may bias spectral estimates and contribute to secondary assignments, particularly in densely labeled regions. Practical mitigation strategies include conservative assignment rules, improved segmentation, and use of nuclear-localized reporters to reduce neuropil contribution. ”

      (5) Clarify claims in the main text/figures wherever exclusivity is implied; label which panels use single-label vs multi-label/soft assignments.

      We agree and thank the reviewer for emphasizing clarity. We did not intend to imply projection exclusivity. We have revised the manuscript text and figure legends to explicitly state where single-label (winner-take-all) assignment is used, and to avoid language that could be read as claiming exclusive projection identity as follows:

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using conservative winner-take-all rule. This assignment reflects the strongest spectral contribution and does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.”

    1. eLife Assessment

      This valuable study addresses a critical question regarding the role of a subpopulation of cortical interneurons (Chrna2-expressing Martinotti cells) in motor learning and cortical dynamics. However, despite the inclusion of interesting behavioral and imaging data, significant limitations remain, even after revision, in the design of the motor learning task and the associated data analyses. As a result, the presented data are incomplete to support the central conclusions.

    2. Reviewer #1 (Public review):

      In this study, the authors investigated a specific subtype of SST-INs (layer 5 Chrna2-expressing Martinotti cells) and examined its functional role in motor learning.

      Most of the issues remain unaddressed. The findings across experiments are inconsistent, and it is unclear how the authors performed their analyses or why specific time points and comparisons were chosen. The study will require major re-analyzing and additional experiments to substantiate its conclusions.

      After reading the reviewers' responses, my major concerns about the manuscript remain unresolved, particularly regarding the arbitrarily defined stages of learning in the motor learning task and how the calcium imaging data align with the animal's movements.

      - In line 331, the authors refer to session 5 as "training," describing it as the final spoon session, and session 6 as "re-training," because it is the first session in which the pellet is presented on the plate rather than on the spoon. However, in Fig. 1F-H, even in the Ctrl group, it is clear that the performance drops significantly in session 5, which is supposed to be the easiest session before switching to the more difficult plate condition.

      - In the classic pellet-reaching task, the spoon sessions would typically be considered "shaping", while the plate sessions would represent the actual training phase. However, in this manuscript, the authors still insist on referring to session 2 as "learning" and session 5 as "training." I don't understand the difference between session 2 and session 5, especially when session 5's performance is lower than session 2 (even in Fig 1H when you compare succ ratio).

      - Since session 6 (on the plate) is considered as "retraining," why don't the authors present the behavioral results beyond session 6? As a result, it remains unclear whether the animals improved their performance during the retraining phase.

      - Lastly, in Fig. 4B the authors present only the success ratio and claim that performance improves with CLZ application. However, when comparing sessions 8-10 between the Ctrl and Cre⁺ groups, there already appears to be a baseline difference. CLZ treatment in Cre⁺ mice seem to bring performance only to the WT level rather than producing a clear improvement beyond baseline.

      - Regarding the alignment between imaging and behavior, the authors report ~100 prehensions per minute. However, the calcium imaging traces show fewer than 20-30 spikes over 150 seconds (~2.5 min; Fig. 1E). This discrepancy raises concerns about whether the authors can truly isolate calcium signals corresponding to individual prehension events (either successful ones or multiple combined events for unsuccessful attempts). The manuscript still does not present behavioral data that directly aligns prehension events with calcium imaging activity. Although the authors performed analyses suggesting that prehension-related activity does not systematically alter non-prehension epochs, this claim is difficult to evaluate without seeing the underlying traces. It is therefore unclear how the authors selected the example calcium traces aligned to prehension onset, given that there are more than 100 prehension events per minute.

      - In Fig. 1I, the authors also did not address why neural activity during successful trials is already lower one second before movement onset. The longer traces provided do not help to explain this observation or clarify the origin of this pre-movement reduction in activity. It actually further suggests that there may be some artifacts in the imaging that could affect the analysis.

      - Overall, because it remains difficult to understand exactly what the authors are analyzing (and because the definitions of the motor learning stages appear arbitrary) it is difficult to agree with the authors' conclusion that Ma2s cells reduce PyrN cell assembly plasticity during learning, thereby possibly facilitating already acquired motor skills.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Malfatti et al. study the role of Chrna2 Martinotti cells (Mα2 cells), a subset of SST interneurons, for motor learning and motor cortex activity. The authors trained mice on a forelimb prehension task while recording neuronal activity of pyramidal cells using calcium imaging with a head mounted miniscope. While chemogenetically increasing Mα2 cell activity did not affect motor learning, it changed pyramidal cell activity such that activity peaks become sharper and differently timed than in control mice. Moreover, co-active neuronal assemblies become more stable with a smaller spatial distribution. Increasing Mα2 cell activity in previously trained mice did increase performance on the prehension task and led to increased theta and gamma band activity in the motor cortex. On the other hand, genetic ablation of Mα2 cells affected fine motor movements on a pasta handling task while not affecting the prehension task. While overall this study addresses an important and timely question, limitations in the design of the motor learning task and data analysis significantly weaken the conclusions drawn in this manuscript.

      Strengths:

      The proposed question of how Chrna2-expressing SST interneurons affect motor learning and motor cortex activity is important and timely. The study employs sophisticated approaches to record neuronal activity and manipulate the activity of a specific neuronal population in behaving mice over the course of motor learning. The authors analyze a variety of neuronal activity parameters, comparing different behavior trials, stages of learning, and the effects of Mα2 cell activation. The analysis of neuronal assembly activity and stability over the course of learning by tracking individual neurons throughout the imaging sessions is notable, since technically challenging, and yielded the interesting result that neuronal assemblies are more stable when activating Mα2 cells.

      Overall, the study provides compelling evidence that Mα2 cells regulate certain aspects of motor behaviors, likely by shaping circuit activity in the motor cortex.

      Weaknesses:

      While the authors addressed some of the concerns raised by the reviewers, several major limitations still exist in the revised manuscript.

      (1) I appreciate the authors now showing more measures of the prehension task (total reaches, success reaches/min, and success ratio) and providing more details on the task design. However, it is unclear why the authors chose a task design that is somewhat different from the commonly used approach. Here they increase the distance of the food pellet each session and are thus making the task increasingly harder, whereas commonly the target distance is kept stable (See 10.1038/nature08389 for example). The result is that important readouts of learning (e. g. success rate) thus remain stable, making it impossible to judge if learning has occurred, without a control group of non-trained mice. This makes it impossible to judge if the task is affected by increased Mα2 cell excitability, since there is no reference of how these measurements are supposed to change in a mouse that learns or doesn't learn the task.

      (2) Regarding the analysis of the calcium imaging data, it is still unclear why the authors cannot report a commonly used dF/F0 or z-score value, as recommended by both reviewers. The authors state the 1 sec time window prior to the prehension cannot be used as a baseline (F0), as there might be preparatory motor activity. In that case an even earlier window (such as -2 to -1sec) or z-scores should be used. The current version relabeling the background subtracted fluorescence signal as dF/F0 is misleading. Relatedly, it is unclear why the authors don't think the 1 sec window before prehension cannot be used as baseline, but at the same time use the difference in calcium activity before and after prehension onset as a cut-off criterion for defining cells as modulated during prehension and including in the analysis.

      (3) While the authors have improved their statistical reporting, key information is still missing in several places. For example, no N-numbers are listed in legends for figure 3, and there is no mention of the number of mice for analysis in figures 2 and 3. For clarity, the authors should also include the statistical test performed in the figure legends for any p-values shown in the figure.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study addresses a critical and timely question regarding the role of a subpopulation of cortical interneurons (Chrna2-expressing Martinotti cells) in motor learning and cortical dynamics. However, while some of the behavior and imaging data are impressive, the small sample sizes and incomplete behavioral and activity analyses make interpretation difficult; therefore, they are insufficient to support the central conclusions. The study may be of interest to neuroscientists studying cortical neural circuits, motor learning, and motor control.

      We thank the reviewers and the editors for the insightful comments. We are pleased to report that the raised issues with the manuscript can be addressed by improving clarity in our writing of specific sections and by providing additional analysis. Specifically, it was not clear in the manuscript text that although we show illustrative data with a lower number of animals, our conclusions are supported by data with a larger and sufficient sample size. Also, the description of our control experiments has been improved to clarify our proper treatment controls. We therefore clarify below that our study presents compelling and sufficient evidence to support our conclusions. We have responded to all the comments, explaining how each concern has been addressed. All line and figure numbers mentioned here refer to the numbering of the reviewed manuscript version. All references are cited as DOIs.

      Reviewer #1 (Public review):

      There are many major issues with the study. The findings across experiments are inconsistent, and it is unclear how the authors performed their analyses or why specific time points and comparisons were chosen. The study requires major re-analysis and additional experiments to substantiate its conclusions.

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions.

      (1a) Behavior task - the pellet-reaching task is a well-established paradigm in the motor learning field. Why did the authors choose to quantify performance using "success pellets per minute" instead of the more conventional "success rate" (see PMID 19946267, 31901303, 34437845, 24805237)? It is also confusing that the authors describe sessions 1-5 as being performed on a spoon, while from session 6 onward, the pellets are presented on a plate. However, in lines 710-713, the authors define session 1 as "naive," session 2 as "learning," session 5 as "training," and "retraining" as a condition in which a more challenging pellet presentation was introduced. Does "naive session 1" refer to the first spoon session or to session 6 (when the food is presented on a plate)? The same ambiguity applies to "learning session 2," "training session 5," and so on. Furthermore, what criteria did the authors use to designate specific sessions as "learning" versus "training"? Are these definitions based on behavioral performance thresholds or some biological mechanisms? Clarifying these distinctions is essential for interpreting the behavioral results.

      We agree that success rate is a more conventional measure than the number of successful prehensions per minute. We have changed all behavior quantifications to success rate. Note that all behavioral conclusions drawn before are still valid under the new quantification (see Figures 1, 4, and 5). Importantly, the terms “learning,” “training,” and “retraining” were defined based on task structure and prior literature on motor learning stages rather than predetermined behavioral performance thresholds. These labels reflect progression through the task design (initial acquisition, continued practice under stable conditions, and adaptation to altered task demands), not biologically distinct or threshold-defined phases. We have revised the Methods section to make these definitions and transitions explicit to avoid ambiguity in interpreting the behavioral results.

      (1b) Judging from Figures 1F and 4B, even in WT mice, it is not convincing that the animals have actually learned the task. In all figures, the mice generally achieve 10-20 pellets per minute across sessions. The only sessions showing slightly higher performance are session 5 in Figure 1F ("train") and sessions 12 and 13 in Figure 4B ("CLZ"). In the classical pellet-reaching task, animals are typically trained for 10-12 sessions (approximately 60 trials per session, one session per day), and a clear performance improvement is observed over time. The authors should therefore present performance data for each individual session to determine whether there is any consistent improvement across days. As currently shown, performance appears largely unchanged across sessions, raising doubts about whether motor learning actually occurred.

      As described in the methods Single pellet prehension task section, in our setup box, the elevated plate slot for pellet delivery is at a challenging position, outside the slit and 2cm to the right, forcing the mice to use the left paw. Therefore, mice need to be trained in gradually harder positions, using a spoon to deliver the pellet instead of placing it directly at the plate slot. Due to the gradually increasing difficulty in the task, the success rate curve remains flat, while the total number of attempts and number of successful prehensions per minute increase (Figure 1 F-H). We therefore argue that motor learning indeed occurred, with a relatively constant success rate when performing a gradually harder task. Further, the success rate and number of successful prehensions of our mice is within levels previously reported for trained mice (10.3791/51238). We added the precise plate slot position in the methods section to make clearer the need of a gradually increasing difficulty delivery method.

      (1c) The authors also appear to neglect existing literature on the role of SST-INs in motor learning and local circuit plasticity (e.g., PMID 26098758, 36099920). Although the current study focuses on a specific subpopulation of SST-INs, the results reported here are entirely opposite to those of previous studies. The authors should, at a minimum, acknowledge these discrepancies and discuss potential reasons for the differing outcomes in the Discussion section.

      We thank the reviewer for pointing this out. It is by no means a neglect, but a careful balance discussing previous literature that can be fairly compared with our findings. It is becoming increasingly clear — with mounting evidence from modern transcriptomic and connectomic studies — that the canonical “three‑cardinal” interneuron populations (SST⁺, PV⁺, VIP⁺) represent oversimplified groupings that mask considerable heterogeneity. For example, in a comprehensive single-cell RNA‑sequencing (scRNA‑seq) study covering ~1.3 million cells from mouse cortex and hippocampus, the authors identified dozens of discrete GABAergic subtypes beyond the classical marker-defined classes, revealing continuous and graded variation in molecular identity across cortical and hippocampal regions (10.1016/j.cell.2021.04.021). Moreover, a recent study focusing on SST-expressing interneurons demonstrated that even within the SST class there are multiple subtypes with distinct laminar distributions, axonal projection patterns, and circuit connectivity — for instance, two different Martinotti subtypes vs. a non-Martinotti SST subtype targeting different pyramidal neuron types and dendritic compartments (10.1016/j.neuron.2023.05.032). Finally, developmental single‑cell transcriptomics shows that interneuron diversity is already apparent at early postmitotic stages, indicating that these subtypes are pre-specified rather than being mere activity‑dependent states (10.1038/s41467‑018‑07458‑1). These findings argue strongly that the traditional SST⁺ / PV⁺ / VIP⁺ classification, while useful as a coarse heuristic, fails to capture the rich diversity in molecular, morphological, and functional phenotypes that likely underlie distinct roles in circuit computation and behavior.

      The consequence of this is that studies using any of these three markers must be cautiously interpreted since in reality, several quite different neuronal populations are studied at once, especially if no efforts were made to tease out which of the participating populations (inside the “cardinal” population) contribute to the effects seen. Most likely, the reported results are based on a mixed population - in the worst case scenario - populations with opposite effects. In any case, we have now included the role of SST-INs in motor learning and M1 circuitry in the discussion section. We also respectfully disagree that our findings are the opposite of previous SST-IN studies. We show that increasing Ma2 excitability improved execution of an already learned movement, while 10.1038/nn.4049 showed that both activating (which is different from increasing excitability) and inhibiting SST-INs impaired the learning of a stereotyped movement. Similarly, 10.1016/j.neuron.2022.08.018 showed that increasing SST-INs excitability impairs motor learning, not execution of a previously learned movement. While we found that increasing excitability of Ma2 cells did not affect motor learning, note that the Ma2 are a subset of martinotti cells with homogeneous electrophysiological and morphological properties (10.1371/journal.pbio.2001392), and martinotti cells themselves are a subset of SST+ cells (10.1016/j.neuron.2023.05.032). The discussion has been updated to include this reasoning.

      (2a) Calcium imaging - The methodology for quantifying fluorescence changes is confusing and insufficiently described. The use of absolute dF values ("detrended by baseline subtraction," lines 565-567) for analyses that compare activity across cells and animals (e.g., Figure 1H) is highly unconventional and problematic. Calcium imaging is typically reported as dF/F0 or z-scores to account for large variations in baseline fluorescence (F0) due to differences in GCaMP expression, cell size, and imaging quality. Absolute dF values are uninterpretable without reference to baseline intensity - for example, a dF of 5 corresponds to a 100% change in a dim cell (F0 = 5) but only a 1% change in a bright cell (F0 = 500). This issue could confound all subsequent population-level analyses (e.g., mean or median activity) and across-group comparisons. Moreover, while some figures indicate that normalization was performed, the Methods section lacks any detailed description of how this normalization was implemented. The critical parameters used to define the baseline are also omitted. The authors should reprocess the imaging data using a standardized dF/F0 or z-score approach, explicitly define the baseline calculation procedure, and revise all related figures and statistical analyses accordingly.

      The calcium imaging used here is 1-photon microendoscopic video data. To our knowledge, it is not possible to extract the true cell baseline over time from 1-photon data, since the background component includes signals from multiple sources, and usually has fluctuations larger than the neural signal itself. We agree that absolute dF values cannot be compared across cells, and that is not what we report here. The CNMF-E algorithm outputs the temporal activity of each neuron with the background component already removed (10.7554/eLife.28728) and therefore the baseline subtraction used in our study is already standardized (10.7554/eLife.38173). Note that although it is common in the literature to record 1-photon data and perform similar preprocessing (some form of baseline subtraction and/or normalization by noise std), referring to the resulting trace as dF/F, that is not entirely correct, since true F0 extraction is not possible. We thus chose to refer to the resulting preprocessed traces as what they actually are - dF detrended (raw trace with estimated background components removed). However, we agree that a better description of the process would be helpful in our manuscript, and that the nomenclature might be confusing to readers. We therefore expanded the methods section to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined. We also updated the example traces in Figure 1E to now show the raw traces, the estimated background components and the detrended traces.

      (2b) Figure 1G - It is unclear why neural activity during successful trials is already lower one second before movement onset. Full traces with longer duration before and after movement onset should also be shown. Additionally, only data from "session 2 (learning)" and a single neuron are presented. The authors should present data across all sessions and multiple neurons to determine whether this observation is consistent and whether it depends on the stage of learning.

      We agree that it would be beneficial to show longer traces as an example of prehension-related activity, so we expanded Figure 1I to show a longer trace for a single neuron. We added to Supplemental Figure 2 plots showing longer traces from all sessions including all neurons for both genotypes.

      (2c) Figure 1H - The authors report that chemogenetic activation of Chrna2 cells induces differential changes in PyrN activity between successful and failed trials. However, one would expect that activating all Chrna2 cells would strongly suppress PyrN activity rather than amplifying the activity differences between trials. The authors should clarify the mechanism by which Chrna2 cell activation could exaggerate the divergence in PyrN responses between successful and failed trials. Perhaps, performing calcium imaging of Chrna2 cells themselves during successful versus failed trials would provide insight into their endogenous activity patterns and help interpret how their activation influences PyrN activity during successful and failed trials.

      The reviewer is correct to assume that increasing excitability of Ma2 cells would suppress PC activity. As shown in Supplemental Figure 2I, that is exactly what we observe when considering only non-prehension related activity. Thus, it is very interesting that the opposite effect is seen for prehension-related activity. Also, this finding perfectly aligns with our results from the assembly analysis showing that assembly activity is decreased within the prehension window compared to outside the prehension window. Unfortunately, imaging Ma2 cells would only add information to this study in understanding their influence on PCs if we image both populations simultaneously, which require equipment and reagents we do not currently have. Fortunately, however, the endogenous activity patterns of Ma2 cells and the direct connectivity between Ma2 and pyramidal cells was already previously investigated in detail (10.1371/journal.pbio.2001392), therefore we expanded the discussion to better explain that the differential changes in PC when increasing Ma2 excitability could be due to increased PC synchronization, since a single Ma2 connects to several PCs, and upon inhibition release all connected PCs fire synchronously.

      (2d) Figure 1H - Also, in general, the Cre+ (red) data points appear consistently higher in activity than the Cre- (black) points. This is counterintuitive, as activating Chrna2 cells should enhance inhibition and thereby reduce PyrN activity. The authors should clarify how Cre+ animals exhibit higher overall PyrN activity under a manipulation expected to suppress it. This discrepancy raises concerns about the interpretation of the chemogenetic activation effects and the underlying circuit logic.

      As explained above, increasing Ma2 excitability indeed decreased non-prehension related PC activity, and the proposed mechanism has been added to the discussion section. We also made

      clearer in the results section that we are referring to prehension-related PC activity, and emphasize that overall non-prehension related PC activity is decreased.

      (3) The statistical comparisons throughout the manuscript are confusing. In many cases, the authors appear to perform multiple comparisons only among the N, L, T, and R conditions within the WT group. However, the central goal of this study should be to assess differences between the WT and hM3D groups. In fact, it is unclear why the authors only provide p-values for some comparisons but not for the majority of the groups.

      We agree that a clearer description of the statistical analysis is warranted. We expanded the statistical analysis methods section to clarify, among other things, that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and only positive p-values are reported in the figures, therefore the absence of p-value for a comparison means that is not significant.

      (4a) Figure 4 - It is hard to understand why the authors introduce LFP experiments here, and the results are difficult to interpret in isolation. The authors should consider combining LFP recordings with calcium imaging (as in Figure 1) or, alternatively, repeating calcium imaging throughout the entire re-training period. This would provide a clearer link between circuit activity and behavior and strengthen the conclusions regarding Chrna2 cell function during re-training.

      Unfortunately, it is not possible in our setup to record calcium imaging and LFP simultaneously, since the implants needed for the miniscope occupy the entire space above the animal’s cranium. To record calcium imaging during the execution of learned movements is also impractical. If the animals were to be implanted before the training phase, the signal will likely be too degraded for recordings after the training sessions, since the miniscope signal quality decreases over time, and over successive miniscope attachments. If the animals were to be implanted between the training and retraining phase (as the LFP group), the gap between training and retraining would be even larger, at least 28 days (as opposed to 16 days for the LFP group), which would affect the performance in the task. Therefore, LFP recordings provide understanding of the higher-level changes happening in neural activity when excitation is increased in Ma2 cells during the execution of learned movements. We respectfully disagree that the results from the LFP group cannot be interpreted in isolation, since we found that mice with increased excitability of Ma2 cells display increased low theta and gamma power during the prehension movement. As discussed in the manuscript, the increased high gamma band power when Ma2 cells are overexcitable, particularly for the successful trials in the planning phase, suggest that Ma2 cells may have a role influencing theta and gamma oscillations during motor performance (lines 1348-1355).

      (4b) It is unclear why CLZ has no apparent effect in session 11, yet induces a large performance increase in sessions 12 and 13. Even then, the performance in sessions 12 and 13 (30 successful pellets) is roughly comparable to Session 5 in Figure 1F. Given this, it is questionable whether the authors can conclude that Chrna2 cell activation truly facilitates previously acquired motor skills?

      We understand that a source of confusion for the behavioral data in the LFP group was the absence of data from sessions 1-7, together with the missing explanation about the task changing from spoon to plate (as explained in answers to question 1a and 1b). Since the animals are getting pellets from the spoon in session 5 (easier) and from the plate in later sessions (harder), the fact that animals achieved the same performance in the plate as they had on the last spoon session indicates they relearned the movement. To further clarify the training development, we added the full set of sessions (1-13) to Supplemental Figure 7, indicating the spoon-to-plate switch after session 5 and the 16-days gap between sessions 7 and 8 (due to viral injection and electrodes implant surgeries).

      (5) Figure 5 - The authors report decreased performance in the pasta-handling task (presumably representing a newly learned skill) but observe no difference in the pellet-reaching task (presumably an already acquired skill). This appears to contradict the authors’ main claim that Chrna2 cell activation facilitates previously acquired motor skills.

      We respectfully disagree that the results for the pasta-handling conflict with the finding that increasing Ma2 excitability facilitates previously acquired movements. The pasta handling specifically measures forepaw dexterity (as outlined in lines 442-444), therefore assessing forelimb function unrelated to learning. Mice perform a set of stereotyped movements to manipulate the pasta, therefore no learning is required (note that animals were habituated to the arena, followed by a single test session, with no training sessions). We do specifically mention in the results section that "we used the pasta handling task to assess forepaw dexterity that does not require learning" (lines 1137-1139). Our findings support our reported conclusion that "Ma2 cells may have a role in orchestrating precise forelimb movements that do not require previous specific training" (lines 1154-1156).

      (6) Supplementary Figure 1 - The c-Fos staining appears unusually clean. Previous studies have shown that even in home-cage mice, there are substantial numbers of c-Fos+ cells in M1 under basal conditions (PMID 31901303, 31901303). Additionally, the authors should present Chrna2 cell labeling and c-Fos staining in separate channels. As currently shown, it is difficult to determine whether the c-Fos+ cells are truly Chrna2+ cells.

      Our c-Fos stain does work well after having improved this method in several of our projects. Unfortunately, we could not check the references mentioned in the comment, since it points to a study that did not mention c-Fos (maybe incorrect PMID code?). However, we found our images to have similar c-Fos levels in control as other studies (for example 10.3389/fnana.2014.00013 Figure 1A and 10.1109/TBME.2024.3401136 Supplemental Figure 2C). Thus, we do find background activity of c-Fos in both Cre+ and control mice, but the c-Fos stain appears clean because of the strong up-regulation and fluorescent signal in exogenously activated hM3Dq+ cells. Also, we noticed that the manuscript was missing a methods section for the c-Fos experiments, therefore we added a section detailing the hM3Dq activation validation (lines 487-498). Further, the figure now displays separate channels for hM3Dq + cells (magenta) and c-Fos (cyan) for better clarity.

      (7) Overall, the authors selectively report statistical comparisons only for findings that support their claims, while most other potentially informative comparisons are omitted. Complete and transparent reporting is necessary for proper interpretation of the data.

      As explained above (comment 3), we expanded the statistical description in the methods to explain that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and that omitted comparisons are non-significant.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure legends - The authors should provide more detailed information in the figure legends, such as N values. It is also not explained what the bold bars, as well as the highest and lowest bars, represent. Clear labeling is essential for proper interpretation of the data.

      We revised all figure legends to add n-numbers for all quantification plots, and expanded the Statistical analysis methods section to explain the labeling of all quantifications.

      (2) Presentation of plots - The authors need to improve the clarity and completeness of their figure presentations. For example:

      (a) In Figure 1F, it is unclear whether the results were obtained under chemogenetic activation, as this information is missing from both the figure and the legend. Currently, it could be a comparison of Cre+ mice with Cre- mice without any manipulations.

      (b) In Figure 1H, p-values are reported, but it is not specified which groups are being compared. As mentioned above, why are p-values only given to some comparisons? Does that mean the others are not significant?

      (c) In Figure 1D, a scale bar should be provided.

      (d) In Figure 1E, the y-axis (fluorescence) scale should be clearly indicated.

      We thank the reviewer’s attention to the figure details. We added the missing scale bars for Figures 1D-E. We also clarified in the results section that all miniscope recordings were performed under clozapine treatment. As answered above (comments 3 and 7), we expanded the methods section to state that although all comparisons were made and appropriately corrected for multiple comparisons, only significant comparisons were reported. As for the groups being compared, every significance bar clearly connects two groups, which are the ones being compared. We also expanded the Statistical Analysis section to state that “Significance bars without ticks represent pairwise comparisons, while significance bars with downward ticks represent an effect.”.

      Reviewer #2 (Public review):

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions. Core findings of this paper, such as the lack of effect of Ma2 cell activation on motor learning, as well as the altered neuronal activity, rely on a sample size of n=3 mice per condition, which is likely underpowered to detect differences in behavior and contributes to the somewhat disconnected results on calcium activity, activity timing, and neuronal assembly activity.

      We understand that the source of confusion is the number of mice used for calcium imaging and the number of mice used for assessing the effect of Ma2 increased excitability in motor learning. The core finding that Ma2 increased excitability did not alter motor learning is supported by the data shown previously in Supplemental Figure 5 (now Figure 1F-H), with n=6 Cre+ and n=7 controls, which has enough statistical power to detect the effect of training session (F (3,33) = 9.254, power = 0.997) and should have enough power to detect the effect of group (estimated power of 0.835 for F(1,11)). The behavior performance of the miniscope-recorded mice was shown in the previous version for transparency, however no conclusion was drawn based on that data. To improve clarity, we now present data from the previous Supplemental Figure 5 as Figures 1F–H. This dataset clearly demonstrates that increased excitability of Ma2 cells did not affect motor learning. In addition, note that all quantification and conclusions drawn about neuronal activity are based on robust sample sizes: 1070 cells for controls and 403 for Chrna2-Cre+, or 70 assemblies for controls and 48 for Chrna2-Cre+. These sample sizes ensure sufficient statistical power, as demonstrated by the multiple significant effects and pairwise differences reported in our study. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice on behavioral outcomes.

      More comprehensive analyses and data presentation are also needed to substantiate the results. For example, examining calcium activity and behavioral performance on a trial-by-trial basis could clarify whether closely spaced reaching attempts influence baseline signals and skew interpretation.

      We agree and we performed a trial-by-trial analysis to verify the effect of adjacent prehensions in the trial signal. We found that only 17.7% of adjacent trials were affected by a previous trial. In addition we selected only trials not preceded by another trial for at least 6s, and evaluated whether activity immediately before the trial (-3 to -1s) is different from the activity long before the trial (-5 to -3s). The rationale is that if a trial would affect the baseline, then activity immediately before would be different from the activity long before the trial. In this analysis, we found no genotype- or session-related differences in baseline amplitude between epochs. Together these results confirm that prehension-related activity does not systematically alter non-prehension epochs. The results are shown in Supplemental Figure 3.

      The study uses cre-negative mice as controls for hM3Dq-mediated activation, which does not account for potential effects of Cre-dependent viral expression that occur only in Cre-positive mice. This important control would be necessary to substantiate the conclusion that it is increased Ma2 cell activity that drives the observed changes in behavior and cortical activity.

      Having a control group of Cre+ mice injected with cre-dependent vector control carrying, for example, only fluorescence, would add one more layer of certainty that the effects observed here are due to CLZ-induced hM3Dq activation. We do not agree, however, that it is necessary to confirm our findings. Cre-dependent expression alone was already extensively demonstrated to have no effect by comparing a DREADD activator to a vehicle treatment (for example 10.7554/eLife.38052, 10.1523/JNEUROSCI.0537-18.2018, 10.7554/eLife.67822). We also showed this for our LFP group (Figure 4), further confirming no effect of Cre-dependent hM3Dq expression alone.

      An unspecific effect of clozapine, where the treatment affects animals without the hM3Dq receptor, would be much more likely. We do control for this by giving the same treatment to Cre+ and Cre- mice. Moreover, since we use a low dose of clozapine, a lack of hM3Dq activation would be more likely, which we also controlled for with the c-Fos experiment as explained in the answer to the Minor point 1. Nevertheless, we added to the discussion that although we find it highly unlikely that the effects found here are due to Cre-dependent viral expression, we have not recorded Cre+ animals expressing control vectors instead of hM3Dq (lines 1360-1375).

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) One of the main findings in this paper is that Chrna2-Cre cell activation did not affect learning of the prehension task; however, the presented data do not convincingly support this claim. Looking at Fig.1F, Cre+ mice appear to have an overall lower number of successful prehensions compared to control mice. If this is not statistically significant, it is likely because n=3 mice for each group is underpowered. To better judge the behavior of these mice, it would be necessary to plot success rate and overall number of prehensions over the entire course of training, in addition to successes per minute. Given that n=3, plotting all individual data points would make more sense than showing a violin plot. Relatedly, in Supplemental Figure 5, there appears to be a clear effect on reduced success rates in Cre+ mice, which is stated in the figure legends, whereas the result section states: we found no effect of genotype on prehension success rates (lines 895-896). The authors should ensure that these behavior experiments are sufficiently powered to detect potential differences in learning between groups and present the complete data and statistical analysis.

      As explained on Comment 1, the finding that Ma2 increased excitability did not alter motor learning is not based on the data on the previous Figure 1F (n=3 Cre+ and n=3 controls, shown for transparency). Instead, it is supported by the data in the previous Supplemental Figure 5, now Figures 1F-H, with n=6 Cre+ and n=7 controls, for which we found only overall effects of training session, but no effect of genotype, with no significant post-hoc pairwise comparisons. We agree that plotting the success rate, total number of prehensions and successful prehensions per minute, for all 6 sessions, allows better evaluation of the mice behavior. We moved the Supplemental Figure 5 into Figure 1, plotting the three measures for the full set of sessions, with individual data points within the violin plots, and expanded the statistical results description on the main text. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice.

      (2) The authors mention that a significant fraction of prehension trials overlapped with a preceding prehension attempt. Were those attempts excluded from the analysis? The stark differences in calcium signals at baseline before prehension onset in some sessions (Figure 1G, Supplementary Figure 2D) suggest that trials preceding closely in time might play a role and could skew the analysis and interpretation.

      Overlapping trials were not excluded from the previous analysis. As summarized in our response to Comment 2, and expanded in the results section (lines 876-894), we found that only 17.7% of adjacent trials were affected by a previous trial, and that when selecting only trials not preceded by another trial for at least 6s, we found no effect of prehension-related activity in the baseline preceding the trials.

      (3) Relatedly, to test the differences in calcium activity before and after prehension onset, it would be clearer to use a delta F/F measure where the 1 second before onset is used as baseline.

      Since a large proportion of neurons are more active before the onset (on the movement planning phase, Figure 2C), the activity 1s before the movement onset cannot be considered as F0. Dividing the activity during the movement by the activity during the planning phase would generate a different measure, a form of execution/planning ratio. We performed this analysis as an additional measure and found a three-way interaction effect of genotype, session, and prehension accuracy, driven by genotype effects on early sessions, indicating that Ma2 activity might be involved in the planning/execution activity balance. Those results are now described in the results section and shown at the Supplemental Figure 4.

      (4) For the experiments in which mice were trained prior to Ma2 cell activation (Fig.4), the behavior in sessions 8-10 does not seem to have reached a plateau yet, and the increase in successful prehensions in sessions 11-13 of Cre+ mice could just be a continuation of training. It would be more convincing to show the original training curve of those mice in sessions 1-7. Additionally, the authors should perform a two-way ANOVA test for the interaction of drug and genotype, rather than two separate one-way ANOVAs.

      We agree, and we now show the curve for sessions 1-7 in Supplemental Figure 7, showing that the success ratio for sessions 8-10 is similar to session 7. Also, a 2-way ANOVA was already performed, although the full report was missing from the manuscript. We switched from successful prehensions per minute to success ratio (see Reviewer #1 comment 1a) and now include the full report, in which we found an overall effect of session, and when grouping by genotype, we found an effect for Cre+ but not control mice (lines 1065-1072).

      Minor points

      (1) The validation experiment for the efficacy of hM3Dq is somewhat confusing. It is surprising that the few hM3Dq-mCherry expressing cells in the cre-negative mice did not show increased c-Fos staining since non-specific leaky hM3Dq expression would presumably still lead to a functional DREADD. The better control for validating the efficacy of hM3Dq-mediated Chrna2-Cre cell activation would be to show c-Fos staining in Cre+ mice with or without clozapine injection. This would control for non-specific c-Fos expression and neuronal activation purely by expression of the DREADD. In cre-negative control mice, the comparison should also be between mice with and without clozapine injection to control for non-specific neuronal activation regardless of hM3Dq expression.

      We thank the reviewer for raising this point and agree that validation of hM3Dq efficacy and specificity requires careful interpretation. In principle, any hM3Dq-expressing cell, including the few hM3Dq-mCherry+ cells observed in Cre– mice, could respond to clozapine. However, in practice, effective DREADD activation depends on sufficient receptor expression levels and on the pharmacodynamics of clozapine in the brain (Gomez et al., 2017, Science, 10.1126/science.aan2475). In our dataset, even in Chrna2-Cre+ mice, only ~76% of hM3Dq+ cells showed c-Fos induction after clozapine, indicating that receptor expression and/or ligand access is not uniform across cells. Consistent with this, the very sparse and weak hM3Dq expression observed in Cre- mice resulted in only 0.8% of hM3Dq+ cells showing c-Fos induction, which is in line with previous reports demonstrating that low-level “leaky” expression is insufficient to drive neuronal activation (e.g. 10.1038/s41467-019-12236-z; 10.1523/JNEUROSCI.0537-18.2018; 10.1523/ENEURO.0363-21.2021).

      The reviewer also suggests that an ideal validation would compare Cre+ mice with and without clozapine to control for any c-Fos induction driven purely by DREADD expression. We agree that such a comparison is informative, and note that in our experiments the c-Fos assay was designed specifically to test whether the low clozapine dose used (0.01 mg/kg) is sufficient to activate hM3Dq in Ma2 cells, rather than to assay baseline effects of viral expression.

      Importantly, non-specific effects of clozapine itself were controlled for throughout the study by administering the same clozapine dose to both Chrna2-Cre+ and Cre– mice in all behavioral and physiological experiments. Thus, any clozapine-driven neuronal activation independent of hM3Dq would be expected to appear in both groups.

      Together, these results indicate that (i) the clozapine dose used is sufficient to robustly activate hM3Dq-expressing Ma2 cells, (ii) sparse leaky expression in Cre– mice is not sufficient to drive measurable activation, and (iii) the effects reported in the manuscript are unlikely to be explained by non-specific clozapine actions or by viral expression alone.

      (2) The authors state in the methods section that "only neurons that displayed a significant change comparing the before onset and after onset phases" were included in the analysis. This appears to bias the data towards neurons that change their activity with the prehension movement. If this is the intention, the authors should clearly state this and their rationale in the results section and show what proportion of recorded neurons fall into this category.

      Yes, thanks for pointing this out, the explanation for this exclusion criteria is missing. We expanded the methods section “Neural activity around prehensions” to explain that since we are evaluating the role of Ma2 cells in the prehension-related activity of pyramidal cells, we excluded neurons with no prehension-related activity. We also stated in the expanded text that 15.97% of recorded neurons were excluded due to no prehension-related activity.

      (3) I don’t understand the peak PC activity latency shown in Figure 2D. How is it possible that there are negative peak latencies during the prehension phase, which is defined as >0sec, (upper right panel), and positive peak latencies in the before prehension phase, which is defined as <0sec, (lower right panel)?

      As stated in lines 939-941 and in the figure 2C legend, neurons were sorted into "before prehension" or "during prehension" neurons according to their activity during the successful prehension. One of our main findings is that the pyramidal cells temporal patterns were strongly affected by prehension accuracy (lines 941-944) meaning that a significant number of neurons shifted prehension phases when performing a failed prehension (as illustrated in Figure 2C, note how the temporal pattern is not kept from successful to failed prehensions). That is why, for failed prehensions, there are negative latencies for neurons that were classified as "during prehension" and positive latencies for neurons classified as "before prehension" in successful trials. We expanded the sorting explanation in the results section (lines 944-950) to better highlight the latency change between different prehension accuracies.

      (4) Please specify how baseline subtraction (detrending) was performed for the calcium image analysis.

      We expanded the methods section “Neural signal extraction” to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined (lines 614-619).

      (5) The authors state that they found a "dissociation between changes in neural activity and performance outcomes". Since they only analyzed motor performance by quantifying successful prehensions, this statement should be caveated with the notion that other aspects of the behavior (e.g., trajectories/speed) could be affected but were not measured.

      We agree, and expanded the discussion section to acknowledge that we focussed the behavioral aspects to success ratio, and that other measures not investigated could also be affected (lines ????-????).

      (6) Are the differences in theta and gamma power specific to the prehension trials, or does Ma2 cell activation generally increase LFP activity in those bands?

      We thank the reviewer for the question, as we had not analyzed general LFP activity in the previous version. We performed the same analysis now including only LFP from epochs outside prehension windows across the full sessions. We found that Mα2 cell activation actually reduces LFP power across all bands specifically in Session 13 when no prehension is being performed. These findings are now included as Supplemental Figure 7.

      (7) Please define terms that might not be familiar to a typical reader in the field, such as "assemblies", when first introducing them in the text.

      We revised the introduction where we now define assemblies (lines 85-88).

      (8) Please specify the n-numbers for each figure throughout the manuscript. For example, in some figures, the number of trials or the number of neurons is used; however, it is not clear what this number is.

      We agree that although the n-numbers are stated in the text, it would be clearer to add them also to the figure legends. All figure legends now contain n-numbers for panels showing quantifications.

      (9) Relatedly, while the inclusion of supplemental tables with expanded statistical results is commendable, several statistical test details are missing, such as for Figure 5.

      We have fully revised the text to add any missing statistical details for the statements in the Supplemental Tables.

    1. eLife Assessment

      This important study provides insights into the role of the cerebellum in fear conditioning, addressing a key gap in the literature. The evidence presented in support of the conclusions is solid. This work will be of interest to both the extinction learning and cerebellar research communities.

    2. Reviewer #1 (Public review):

      Nio and colleagues address an important question about how the cerebellum and ventral tegmental area (VTA) contribute to extinction learning of conditioned fear associations. This work tackles a critical gap in the existing literature and provides new insights into this question in humans through the use of high-field neuroimaging with robust methodology. The presented results are novel and will broadly interest both the extinction learning and cerebellar research communities. As such, this is a very timely and important contribution.

      Strengths:

      The core finding - coupling of cerebellum and VTA as a reward-like prediction errors during fear extinction - is novel and addresses a genuine gap in the literature. Also the paradigm spanning several sessions, a well-powered sample, 7T imaging and complementary analytical approaches to target the question is commendable.

      Weaknesses:

      The authors have satisfactorily addressed the concerns raised in the previous version of the manuscript. Several results, as well as conclusions drawn from them, still rest on trend-level evidence, although the revised presentation of the results now provides a more balanced interpretation of these findings.

    3. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Nio and colleagues address an important question about how the cerebellum and ventral tegmental area (VTA) contribute to the extinction learning of conditioned fear associations. This work tackles a critical gap in the existing literature and provides new insights into this question in humans through the use of high-field neuroimaging with robust methodology. The presented results are novel and will broadly interest both the extinction learning and cerebellar research communities. As such, this is a very timely and impactful manuscript. However, there are several points that could be addressed during the review process to strengthen the claims and enhance their value for readers and the broader scientific community.

      (1) Reward Interpretation and Skin Conductance Responses (SCR)

      A central premise of the manuscript is that 'unexpected omissions of expected aversive events' are rewarding, which plays a critical role in extinction learning. The authors also suggest that the cerebellum is involved in reward processing. However, it is unclear how this conclusion can be directly drawn from their task, which does not explicitly model 'reward.' Instead, the interpretation relies on SCR, which seems more indicative of association or prediction rather than reward per se. Is SCR a valid metric of reward experienced during the extinction of feared associations? Or could these findings reflect processes tied more closely to predictive learning? Please, discuss.

      We thank the reviewer for raising this important point. We agree that skin conductance responses (SCRs) do not directly index reward. More generally, SCRs reflect autonomic arousal in response to salient or motivationally significant stimuli and are closely linked to expectancy and contingency awareness. In our study, SCRs served as a read-out of the participants’ expectation of a US, and were used to fit the hyperparameters of a reinforcement-learning-based deep learning model, which then provided per-trial estimates of prediction and prediction error values. These estimates capture predictive learning about the occurrence of the aversive US, rather than reward per se. The interpretation of unexpected US omissions as “reward-like” prediction errors relies on prior literature, particularly rodent studies showing that dopaminergic neurons in the VTA respond to omitted aversive stimuli and drive extinction learning via projections to the nucleus accumbens (Kalisch et al., 2019; Salinas-Hernández et al., 2018, 2023). We therefore interpret our cerebellar activations during unexpected omissions as being compatible with the processing of reward-like prediction errors, while acknowledging that this inference is indirect.

      To clarify this reasoning, we made revisions to the Introduction and Discussion to (i) state explicitly that SCRs do not directly measure reward but were incorporated into the reinforcement learning model as an index of autonomic arousal related to US expectancy and predictive learning, and (ii) consistently replace the term “reward prediction error” with “reward-like prediction error” throughout.

      (2) Reinforcement Agent and SCR Modeling

      The modeling approach with the deep reinforcement agent treats SCR as a personalized expectation of shock for a given trial. However, this interpretation seems misaligned with participants' actual experience - they are aware of the shock but exhibit evolving responses to it over time. Why is this operationalization useful or valid? It would benefit the manuscript to provide a clearer justification for this approach.

      This point is well taken. We did not collect trial-by-trial expectancy ratings, as frequent button-box responses would have induced cerebellar activations unrelated to fear (extinction) learning. Subjective expectancy was assessed only at the end of each experimental phase. As frequently done in the human fear conditioning literature, we used trial-by-trial SCR data (Lonsdorf et al., 2017). Although SCRs show correspondence with US expectancy ratings, they are inherently noisy and show substantial variability across trials and participants (Constantinou et al., 2021). Therefore, individual trial-by-trial responses cannot be used to directly infer US predictions. Accordingly, we used group-averaged SCR data to fit model hyperparameters in a grid search across parameter settings. The best-fitting hyperparameters were then applied to 100 randomly initialized agents, and their outputs were averaged to generate trial-wise estimates of predictions and prediction errors. These averaged values were used as parametric modulators in the fMRI analyses. We have revised the Introduction and Methods to make this procedure clearer.

      (3) Clarity and Visualization of Results

      The results section is challenging to follow, and the visualization and quantification of findings could be significantly improved. Terms like 'trending' appear frequently - what does this mean, and is it worth reporting? Adding clear statistical quantifications alongside additional visualizations (e.g., bar or violin plots of group means within specific subregions within the cerebellum, or grouped mean activity in VTA and DCN) would enhance clarity and allow readers to better assess the distribution and systematicity of effects. Furthermore, the figures are overly complex and difficult to read due to the heavy use of abbreviations. Consider splitting figures by either phase of the experiment or regions, and move some details to the supplemental material for improved readability.

      We agree with the reviewer that the clarity of results can be improved and have revised the manuscript accordingly. Specifically:

      (1) We use “trend-level” to refer to uncorrected voxelwise t-maps at p < 0.05, and “significant” to refer to TFCE/FWE-corrected effects at p < 0.05. This distinction was not sufficiently clear in the original figures. To address this, uncorrected t-maps are now displayed with a grey striped background frame, and colorbar labels have been enlarged to emphasize whether TFCE/FWE-corrected or uncorrected t-values are shown.

      (2) We added a supplementary table (Table S7) reporting group-level summary statistics for all fMRI contrasts presented in the manuscript, including group means, standard deviations, effect sizes (Cohen’s d), and 95% confidence intervals for cerebellar cortex, cerebellar nuclei, and VTA VOIs. We hope that this helps with the interpretation of effect magnitude and variability across fMRI analyses.

      (3) To improve readability, we split overly complex figures: Figure 2 now separates CS-related prediction from US-related presentation contrasts (which are now revised Figures 4 and 5), and Figure 3 separates event-based and parametric modulation contrasts (which are now revised Figures 6 and 7).

      (4) We also reduced abbreviations in the figures, and provide full definitions and explanations also including the original abbreviations in the main text and figure captions for clarity.

      We considered the suggestion to split figures further by region or by phase. However, we believe it is more informative to present the cerebellar cortex, nuclei, and VTA together for each contrast, and to keep all phases side by side, as this allows readers to directly assess commonalities across phases. We therefore chose to keep the same overall structure, but simplified the figures in other ways (e.g. splitting by contrast type) to improve overall readability. We hope that these changes address the reviewer’s concerns by simplifying the presentation, removing abbreviations, and providing clearer quantification of results.

      (4) Theoretical Context for Paradigm Phases

      The manuscript benefits from the comprehensive experimental paradigm, which includes multiple phases (acquisition, extinction, recall, reacquisition, re-extinction). This design has great potential for providing a more holistic view of conditioned fear learning and extinction. However, the manuscript lacks clarity on what insights can be drawn from these distinct phases. What theoretical framework underpins the different stages, and how should the results be interpreted in this context? At present, the findings seem like a display of similar patterns across phases without sufficient interpretation. Providing a stronger theoretical rationale and reorganizing the results by experimental phase could significantly improve readability and impact.

      We thank the reviewer for this constructive suggestion. We would first like to mention that the primary aim of this manuscript is not to analyze differences between phases, but rather to highlight the commonalities. Across different learning contexts, we consistently observed reward-like prediction error-related activations in the cerebellum and VTA. This consistency and connectivity between the cerebellum and VTA, despite phase-to-phase differences, is the most important finding of our study.

      We agree, however, that the manuscript did not sufficiently explain how each phase differs conceptually, which is important for readers to understand why the consistency of responses is notable. We therefore expanded the Introduction and Discussion to provide clearer theoretical context for each phase. More specifically, the phases can be understood as follows:

      Extinction (day 2): Because acquisition was conducted with a 100% reinforcement rate, unexpected US omissions during initial extinction trials maximize reward-like prediction errors and yield stronger, more uniform expectations across participants compared to a partial reinforcement rate. This phase should therefore provide the clearest opportunity to observe cerebellar-VTA contributions to the processing of reward-like prediction errors.

      Recall (day 3): Despite allowing for the consolidation of extinction learning, the recall test often still elicits conditioned fear responses to the CS+, that is, shows spontaneous recovery of the initial fear association (Bouton, 2002). In these trials, the non-occurrence of the US is unexpected. In this context, US omission-related activations reflect reward-like prediction errors during renewed fear responding in the presence of both a fear memory and an extinction memory. This contrasts with extinction training on day 2, where prediction errors arose primarily against the background of the recently acquired fear memory, without a competing extinction memory.

      Reacquisition (day 3): Unlike acquisition, reacquisition used a partial reinforcement rate, such that non-reinforced CS+ trials were interspersed between reinforced CS+ trials (similar to the partially reinforced phase used by Ernst et al., 2019). Because reacquisition occurs in the presence of savings, that is, the presence of a previously acquired fear memory, US expectancy increases rapidly following reinforced trials and relearning occurs faster (Bouton, 2004). Importantly, partial reinforcement maintains high US expectancy and therefore allows prediction errors to remain sustained across omission trials (Figure 9).

      Reextinction (day 3): Reextinction is an additional extinction phase but without a consolidation interval, and with an already established fear extinction memory. Because reextinction followed the partially reinforced reacquisition phase, prediction errors during early reextinction decayed more slowly than during extinction on day 2 (following the fully reinforced acquisition phase on day 1) (Figure 9). Together, reacquisition and reextinction were designed to maximize the number and persistence of unexpected US omissions, thereby providing additional opportunities to examine reward-like prediction-error signaling.

      By clarifying this framework, we aim to show that while the learning context and history differ across phases, the consistent cerebellum-VTA activation and connectivity related to unexpected US omissions underlines the robustness of the effect. We chose not to reorganize the Results by phase, as our central conclusion rests on similarities rather than differences. Instead, we have clarified the theoretical background in the revised manuscript to help readers interpret both the commonalities and the potential sources of variability.

      (5) Cerebellum-VTA Connectivity Analysis

      The authors argue that the cerebellum modulates VTA activity, yet they perform the PPI analysis in the reverse direction. Why does this make sense? In their DCM analysis, they found a bidirectional relationship (both cerebellum - VTA and VTA-cerebellum), yet the discussion focused on connectivity from the cerebellum to VTA. A more careful interpretation of the connectivity findings would be useful - especially the strong claims in the discussion on the cerebellum providing the reward signal to the VTA should be tempered.

      We thank the reviewer for highlighting this issue. In our primary analysis, we used the VTA as the PPI seed and observed trend-level connectivity with the cerebellum. When we reversed the analysis and used the cerebellar volume of interest (VOI) from the conjunction analysis as the seed, effects in the VTA were substantially weaker. We believe this reflects the broad connectivity profile of the cerebellar VOI (i.e., not specific to the VTA) as well as general limitations of PPI in our study, including the small number of unexpected omission trials and the lack of specificity to reward-like prediction errors (e.g., connectivity also appeared during US presentation). For transparency, we now report the cerebellar-seed PPI results in the Supplementary information (Figure S3). Given their limited robustness, we chose not to include the corresponding VTA maps in the main figures.

      Finally, we agree that our conclusions regarding cerebellum-VTA interactions should be framed more cautiously. While the DCM analyses support bidirectional connectivity, our original discussion placed disproportionate emphasis on cerebellum-to-VTA influences. We have revised the text to provide a more balanced interpretation that also considers VTA-to-cerebellum connectivity.

      Reviewer #2 (Public review):

      Summary

      Building upon the group's previous work, this study used a 3-day threat acquisition, extinction, recall, reextinction, and reacquisition paradigm with 7T imaging to probe the mechanism by which the cerebellum contributes to fear extinction learning. The authors hypothesize this may be via its connection to the VTA, a known modulator of fear extinction due to its role in reward processing. Using complementary analysis methods, the authors demonstrate that activity with the cerebellum, DNC, and VTA is modulated by predictions about the occurrence of the US, which shows regional specificity. They show trend-level evidence that there is increased functional connectivity between the cerebellum and VTA during all phases of the paradigm with unexpected omissions. They also present a DCM which indicates that the cerebellum could positively modulate VTA activity during extinction learning. This study adds to a growing literature supporting the role of the historically overlooked cerebellum in the control of emotions and suggests that an interaction between the cerebellum and VTA should be considered in the existing model of the fear extinction network.

      Strengths

      The authors address their research question using a number of complementary methods, including parametric modulation by model-derived expectation parameters, PPI, and DCM, in a logical and easily understood way. I feel the authors provide a balanced interpretation of their findings, presenting numerous interpretations and offering insight with regard to reward vs attention or unsigned prediction errors and the directionality of the interaction they identify. The manuscript is a timely addition to growing literature highlighting the role of the cerebellum in fear conditioning, and emotion generation and regulation more generally.

      Weaknesses

      Subjective and skin conductance responses do not completely support the success of the learning paradigm. For example, CS+/CS- differentiation in both domains persisted after extinction training. I do not feel that this negates the findings of this manuscript, though it raises questions about the parametric modulators used, and the interpretation of the neural mechanisms proposed if they do not strongly relate to updated subjective appraisals (the goal of extinction therapy). My interpretation of the manuscript suggests there are some key results based upon contrasts that have as few as three events; I am a little unsure about the power and reliability of these effects, though I await author clarification on this matter. There are a number of unaddressed deviations from the pre-registered protocol that I have asked the authors to elaborate upon.

      We thank the reviewer for the thoughtful and constructive evaluation of our work. We appreciate that the manuscript and methods were found to be clearly presented, and we welcome the suggestions for clarification and improvement. Below we address the specific concerns regarding extinction learning in behavioral measures, the reliability of event-based contrasts with few trials, and deviations from the preregistration.

      Extinction in self-reports and skin conductance responses (SCRs)

      The reviewer is correct that CS+/CS- differentiation persisted after extinction. Although there was no differentiation in SCRs at the end of extinction, post-extinction self-reports continued to do so, albeit to a lesser degree, which is in line with previous literature on dissociation of outcome measures during fear conditioning (Lipp et al., 2003). This residual subjective differentiation is also consistent with extinction forming an inhibitory memory trace that suppresses, rather than erases, the original fear association (Bouton, 2002; Milad & Quirk, 2012), and a single extinction session is often insufficient to eliminate differential responding (Craske et al., 2014; Vervliet et al., 2013). However, both measures showed significant effects of extinction learning.

      We included additional analyses of self-reports across phases. Importantly, CS+ ratings were significantly reduced during extinction and recall compared to acquisition (all p ≤ 0.001), whereas CS- ratings remained unchanged (all p > 0.532). This pattern demonstrates that the magnitude of the CS+/CS- difference was significantly reduced relative to acquisition, indicating that extinction learning did occur (Doubliez et al., 2025).

      For physiological responses, extinction learning was shown in PSRs but not conclusively in SCRs. PSRs showed a significant reduction of CS+ responses across extinction, while CS- responses remained unchanged. SCRs showed a reduction of CS+/CS- differentiation across extinction; however, this effect remained at trend level, as the Stimulus x Time interaction did not reach significance (p = 0.053). This pattern is consistent with early differentiation followed by rapid attenuation under the full reinforcement structure of the paradigm (100% reinforcement during acquisition and 0% during extinction). Under such conditions, participants rapidly learn that the US is no longer delivered during extinction, such that physiological responses are largely confined to the first few trials, leaving limited power to detect extinction effects in noisier measures such as SCRs. To address the lower robustness of SCR effects, as recommended by the reviewer, we therefore included PSRs in the main Results section, which provide converging physiological evidence for extinction learning.

      Of note, on day 3, both physiological measures and self-reports again showed CS+/CS- differentiation, consistent with spontaneous recovery, a well-established phenomenon reflecting the persistence of the original fear trace after consolidation (Bouton, 2002; Vervliet et al., 2013).

      Taken together, these findings demonstrate that the paradigm successfully induced both acquisition and extinction of conditioned fear, even though residual fear responses persisted.

      Reliability of event-based contrasts with three trials

      The initial decision to use three events for event-based contrasts was based on SCR and PSR data, which showed that differentiation between CS+ and CS- occurred almost exclusively in the first few trials of extinction and recall. Consistent with the full reinforcement described above, prediction errors were expected to be high in the very first extinction trials, and to decay rapidly. Thus, the usual half-block division (e.g., first eight trials) would have included many trials without meaningful prediction errors.

      We acknowledge that contrasts based on three trials provide limited statistical power. To address this concern, we added a supplementary table showing summary statistics for contrast estimates in the cerebellar cortex, cerebellar nuclei, and VTA VOIs across all fMRI analyses (Table S7), including both the event-based and parametric modulation approaches. Importantly, the event-based contrasts showed moderate to strong effects despite being restricted to the first three unexpected omission trials. Moreover, the parametric modulation analyses, which incorporate all available trials, yielded results that were consistent with the three-trial event-based contrasts and with the patterns shown in the main figures. This convergence between event-based and parametric approaches strengthens our confidence that the observed effects are reliable.

      Deviations from preregistration

      We acknowledge that deviations from the preregistered protocol were not fully documented and have now added this information. The main deviation concerned our event-based analyses: while the preregistration planned early vs. late block comparisons, in practice the rapid decay of SCRs under our 100% and 0% reinforcement rates rendered later trials uninformative for prediction error analyses. We therefore focused on the first three trials, when prediction errors are expected to be present. These behavioral findings are also consistent with Doubliez et al. (2025), who used the same paradigm and observed similar rapid SCR decay. Other deviations, such as not reporting exploratory whole-brain DCM analyses, are now clearly stated for transparency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Point - Paradigm Details

      Providing additional details about the experimental paradigm in the main text (e.g., the nature of the visual stimuli associated with shocks) would enhance the manuscript's clarity. Some of the information currently in supplementary Figure 5 could be incorporated into the main text to enhance the understanding of the paradigm

      We agree that the current structure reduces clarity, as the paradigm is only explained in detail after the results. To improve readability, we have moved parts of Figure 5 (illustrating the paradigm and scanner setup) to the beginning of the manuscript (now revised Figure 1). In addition, information from Figure 5, including details of the visual stimuli, is now added to the Introduction.

      Reviewer #2 (Recommendations for the authors):

      Methods

      Can the authors please clarify what part of the task went into [US post CS+ > no US post CS-] contrast? Is this the time immediately after the CS presentations, when the US has just occurred/not occurred, or rather more like the CS+>CS- contrast except including trials confounded by the US (i.e. [CS+/US > CS -])?

      The contrasts are based on an event-related separation of CS and US. The CS was presented for 6 seconds, with its onset modeled in the GLM as a zero-duration event (delta function). The CS offset coincided with either the delivery or omission of the US, which was likewise modeled as a zero-duration event. Thus, CS onset and offset were modeled separately. The no-US events were further distinguished by whether they followed a CS+ or a CS-. Accordingly, we analyzed both CS and US-related contrasts; for example, the CS+ > CS- contrast reflects CS-related differentiation at CS onset (0 s), whereas [US post CS+ > no US post CS-] reflects (no-)US-related activity at CS offset (6 s; US delivered from 5.9-6.0 s). We have added further clarification to the Methods section.

      I was a bit unclear on what this sentence of the methods meant "Notably, all single trials comprised CS+ trials, with CS- trials also being modeled as single trials to facilitate paired analysis", does this mean that some contrasts had 6 events in total - e.g. the first 3 unexpected omissions vs 3 x CS-. If so, which CS- were selected for the comparison?

      We agree that this sentence was unclear and have revised it. Our intention was to describe that when CS+ trials were modeled as single trials in the GLM (e.g., each CS+ onset and its associated [no-]US event modeled as separate regressors), the CS- trials were modeled in the same way. This ensured that paired analyses would be possible if required.

      For reacquisition and reextinction, single-trial modeling was necessary, as the last unexpected omission of reacquisition is also the first unexpected omission of reextinction. Modeling trials separately allows us to examine the first three unexpected US omissions in each phase independently.

      The event-based contrasts for unexpected US omissions were defined in line with a previous study of our group. For example, during extinction we contrasted the first three unexpected US omissions following CS+ with all expected omissions following CS- (i.e. [first 3 no US post CS+ > no US post CS-], corresponding to 3 vs. 16 events). The weights of events were automatically scaled by SPM12 so that both sides of the contrast carried equal total weight (e.g. positive events weighted 1/3, negative events weighted -1/16). This procedure matches the approach in Ernst et al. (2019), where in partially reinforced acquisition 6 unexpected omissions after CS+ were contrasted with 16 expected omissions after CS-.

      More generally, can the authors please comment on the power and reliability of analyses that include only 3 events in a condition [e.g. the first 3 unexpected omissions]?

      It is not clear if the (US post CS+ > no US post CS-) phases were included. In your pre-registration you say "we will use a "no US post CS+ > no US post CS-" fMRI contrast, where "no US post CS+" designates unexpected omission events in early extinction, early recall (depending on behavioral data which might indicate a return of fear) and a volatile phase (where unexpected omissions occur in the first part of the volatile phase, i.e. reacquisition).", but my reading of the manuscript was that it included both early and late "see 1st level analysis = US post CS+, no US post CS+, no US post CS- separately for each phase; 2nd level = contrast included unexpected omission of the US (no US post CS+ > no US post CS-)". Please clarify and if necessary explain the deviation from preregistration.

      We agree that this point requires clarification. In the preregistration, we planned to divide phases into early and late blocks (no US post CS+ > no US post CS-). However, as already outlined in our response (Reviewer 2, public review response: Reliability of event-based contrasts with three trials), both our preliminary behavioral data and subsequent modeling analyses indicated that differentiation between CS+ and CS- declined extremely rapidly under the 100% reinforcement schedule, leaving likely little or no prediction error beyond the first few trials. Based on this, we adapted the event-based analyses to focus on the first three unexpected omission trials in extinction, recall, and reextinction, where prediction errors are expected to be present. In reacquisition, only three omission events occur by design (83% reinforcement), so this naturally constrained the analysis to three trials. We now explicitly describe this deviation from the preregistration in the revised manuscript.

      As outlined in the same response, we recognize that contrasts based on three trials provide limited statistical power, and addressed this point by providing additional summary VOI statistics of contrast estimates for both event-based and parametric modulation contrasts, which show moderate-to-strong effect sizes and convergence across methods, which we argue supports that using the first three trials is a reliable approach (Reviewer 1, public review response, point (3) Clarity and Visualization of Results).

      Finally, with regard to the reviewer’s specific question: yes, US post CS+ > no US post CS- contrasts were examined for acquisition training, primarily to demonstrate US-related activation (see revised Figure 3).

      Results

      Page 5 + 6: Including the interaction effects for pupil size responses during extinction and reextinction in the SCR section seems unjustified. I appreciate that the SCR data does not significantly support the key claim that extinction learning towards the CS+ occurred, but I do not feel it is acceptable to draw from the other measure for this effect alone. If the PSR measure is of primary/significant importance to support the validity of your paradigm, please consider adding all of these results to the main manuscript.

      We agree with this point and have moved the PSR analysis to the main manuscript. In addition, the SCR Results section no longer includes the PSR analyses, and clearly states the absence of a significant Stimulus x Time interaction effect in extinction (p = 0.053). For completeness, we additionally report trend-level post hoc tests showing CS+/CS- differentiation during early extinction but not during late extinction, consistent with an initial differentiation that attenuates across extinction training.

      Subjective and (some) skin conductance responses do not completely support the success of the learning paradigm. For example, CS+/CS- differentiation in both subjective domains and SCRs persisted after extinction training. Can the authors comment on how this might influence the interpretation of their results more generally? What does it mean if these expectations do not appropriately translate to updated subjective appraisals in your participants, contrary to the model from which the parametric modulators were derived would predict?

      The persistence of CS+/CS- differentiation in self-reports after extinction, and the return of CS+/CS- differentiation in both self-reports and physiological measures during the recall test, is not unexpected. For self-reports administered after extinction, such persistent CS+/CS- differences are commonly observed in the human fear extinction literature (Hermans et al., 2006; see also Lipp et al., 2003), and may reflect that initial extinction learning establishes a new inhibitory association that suppresses, but does not erase, the original fear memory (Bouton, 2002). At recall on day 3, the remaining differentiation in both self-reports and physiological responses is consistent with spontaneous recovery, a well-documented phenomenon in extinction research (Bouton, 2002). As noted earlier (Reviewer 2, public review response: Extinction in self-reports and skin conductance responses (SCRs)), additional analyses showed that ratings were significantly reduced after extinction and recall compared to acquisition. Thus, while residual differentiation in self-reports remained after extinction and recall, its magnitude was diminished, indicating that extinction learning occurred but was incomplete. This pattern is consistent with partial updating of subjective appraisals in accordance with the reinforcement-learning model used to derive the parametric modulators, rather than a failure of updating.

      Figures

      Figure 1: Please ensure that the summary of your results in the figure legend is consistent with the quantitative results reported. Example 1: "On day 2, there was a loss of differentiation during extinction training.", however, a significant effect of the stimulus, and time remained (but no interaction). Please tone down this interpretation, or make it clearer how the difference in the initial extinction trials was quantified. If the ANOVA-type analysis was only performed in the first half, this was not clear. Example 2: "During initial reacquisition, there were again differential responses to the CS+ and CS-, which decreased in reextinction and the unexpected US phase". I appreciate that you refer to the difference decreasing, rather than disappearing altogether, but the magnitude of this difference is not reported in the manuscript, and there does remain a significant difference in the amplitude.

      We thank the reviewer for this helpful feedback. We have revised the figure legends to tone down overly strong statements and ensure that all descriptions are in correspondence with the quantitative results. For clarity, we have also added significance markers for (trend-level) post hoc comparisons (CS+/CS- differentiation within early and late blocks for each phase) to revised Figures 2 and 3 displaying SCRs and PSRs.

      Figure 2, 3, 4: I found it quite confusing to have uncorrected and corrected results displayed in the same way in the same figure. E.g. Figure 2A which, as far as I can tell shows trend-level results for the cerebellum, and corrected results for the VTA. For Figures 2 and 3 it was also not immediately clear which colour bar related to which map. Figure 4A appeared to be missing colour bars. I suggest the authors consider (as much as possible) standardising the colour bar scales, such that the maps across figures/sub-plots are more directly comparable, and differentiate more clearly between corrected and uncorrected results. The 3D renders in Figures 2 and 3 are a little hard to see - would it be possible to make it not so transparent?

      We use “trend-level” to refer to uncorrected voxelwise t-maps at p < 0.05, and “significant” to refer to TFCE/FWE-corrected effects at p < 0.05. This distinction was not sufficiently clear in the original figures. In the revised figures, uncorrected t-maps are displayed with a grey striped background frame. Colorbar scales were not standardized, as different panels display different statistical quantities (TFCE values versus t-values), and scaling was chosen to visualize variation within each contrast rather than enforce comparability across panels, which would have reduced interpretability. In addition, the missing colorbar in Figure 8A (formerly Figure 4A) has now been added; it matches the colorbar shown in Figure 8B. See also Reviewer 1, public review response, point (3) Clarity and Visualization of Results.

      Is it possible to annotate significant effects on Figure 1 and Supplement Figure 1? The use of square markers makes it quite hard to tell the value of each point, which, given the small scale of the y-axis is quite important for interpretation. Could the authors consider remaking these plots with smaller dots?

      We have added post hoc significance markers to Figures 2 and 3 displaying SCRs and PSRs to facilitate interpretation. These markers reflect post hoc comparisons of CS+/CS- differentiation within early and late blocks. In cases where the Stimulus x Time interaction was not significant, the corresponding post hoc markers are still shown but are indicated in red to denote their trend-level status. In addition, the plots have been remade with smaller dots to make individual values clearer.

      Discussion

      The authors state "Because aversive stimulus presentation results in pronounced cerebellar activations, we were unable to separate cerebellar activation related to the unexpected (initial acquisition trials) and the expected (late acquisition trials) presentation of the US." Could the authors compare between early[CS+>CS-] and late[CS+>CS-] acquisition (which I believe were created in the event-based analysis but results not reported), or between the first 3[CS+ with US>CS-] and later [CS+ with US>CS-] to assess this?

      In our terminology, the suggested comparisons (early vs. late [CS+ > CS-] or first three vs. last three [CS+ > CS-]) reflect changes in US prediction rather than prediction error. The statement in the Discussion refers specifically to cerebellar activation during US presentation, where distinguishing between expected and unexpected presentations is complicated by the strong cerebellar activation elicited by the electrical US itself. Moreover, when comparing early “unexpected” US presentations with later “expected” ones, the relatively higher activity in early trials could reflect habituation of the US sensation (i.e., non-associative learning) rather than a prediction error, making interpretation difficult.

      Because the current manuscript focuses on reward-like prediction errors, we did not report these US prediction or presentation contrasts in detail. In brief, the suggested comparisons of early versus late CS-related differentiation (CS+ > CS-), revealed only limited trend-level activity. In contrast, US-related responses during acquisition showed robust activations in the cerebellar cortex, DCN, and VTA across the acquisition phase. Comparisons between the first three US presentations and later US presentations showed broadly distributed and stronger responses during early acquisition than during later US presentations. This pattern seems to be more consistent with non-associative effects, such as sensory habituation to the electrical stimulation, rather than with prediction-error–related processing. We have therefore not included them in the manuscript, but would be open to providing them in the Supplementary Information if the editor or reviewers consider them essential.

      General

      In your pre-registered analysis plan you state "we will explore the use of DCM in a larger network that encompasses known constituents of the fear extinction network, in addition to the cerebellum and VTA.". You have plenty of results to discuss in the current manuscript and adding this may complicate the narrative, but that being said, please either perform and include this analysis as you proposed or explicitly mention why this was not completed. You could also consider adding a whole-brain activation map for the key phases of the experiment. Please also double-check other pre-registered points, for example - the sample size justification is also different.

      We decided not to include whole-brain DCM analyses in this manuscript and not to report whole-brain activation results extensively, as the study was primarily hypothesis-driven with a focus on cerebellum-VTA interactions. While we recognize that whole-brain analyses are of interest and plan to explore them in future work, they were considered outside the scope of the current paper. This deviation from the preregistration is now explicitly noted in the revised manuscript.

      Regarding the sample size justification, the preregistration contained an error: the parameters were reported incorrectly. The correct sample size justification was already provided in the original 2019 grant application and is correctly reported in the current manuscript. The underlying power analysis was the same, but with different alpha levels depending on whether the study involved healthy participants (where larger samples are feasible) or rare patient populations (where stricter alpha levels are not practical). We have clarified this point in the manuscript under deviations from the preregistration.

      Additional changes made in manuscript by authors

      To provide a complete overview, we also note changes made independently of specific reviewer comments:

      Methods

      In the computational modeling section, “reextinction” was mistakenly mentioned where “reacquisition phase” was intended (the initial phase of the volatile phase before experience replay). This has been corrected.

      The term “trial sequence” is used in computational modeling, whereas counterbalancing in the fear conditioning methods used different terminology. We added a clarifying sentence in the modeling section to make this consistent.

      References in the pupil size analysis section (Jentsch et al. 2020; Mathôt et al. 2017) were misplaced and have now been moved earlier in the sentence.

      The citation for MRIcroGL software was updated to the current Nature Methods reference.

      We added a reference to Doubliez et al. 2025 which used the same three-day paradigm in a behavioral study showing similar physiological responses.

      Supplementary information

      During revision, we noted that the SCR statistics had been computed on an earlier preprocessed dataset version, whereas the finalized corrected dataset was already used for plotting and for estimating prediction and prediction-error values in the reinforcement-learning model. We therefore recomputed the SCR statistics on the finalized dataset for the sake of consistency; this did not change any main effects, interactions, or conclusions, with the only difference being an exploratory late-acquisition CS+/CS- post hoc shifting from non-significant to p < 0.05 (interaction still non-significant). Updated statistics are reported in the Supplementary information.

      Post hoc significant differences in Table S3 are now marked in bold, as the formatting was missing previously.

      To align behavioral analyses more closely with the event-based fMRI approach, we additionally examined physiological responses using a first three versus last three trial division within each phase. These analyses yielded patterns consistent with those obtained using the original early/late block division and are reported in the Supplementary Information.

      We added a new supplementary figure (Figure S4) showing the location of the cerebellar VOI on a SUIT flatmap and added a corresponding cross-reference in the Methods section (Volumes of interest (VOI) definition)

      References

      Bouton, M. E. (2002). Context, ambiguity, and unlearning: sources of relapse after behavioral extinction. Biological Psychiatry, 52(10), 976–986. https://doi.org/10.1016/S0006-3223(02)01546-9

      Bouton, M. E. (2004). Context and Behavioral Processes in Extinction: Table 1. Learning & Memory, 11(5), 485–494. https://doi.org/10.1101/lm.78804

      Constantinou, E., Purves, K. L., McGregor, T., Lester, K. J., Barry, T. J., Treanor, M., Craske, M. G., & Eley, T. C. (2021). Measuring fear: Association among different measures of fear learning. Journal of Behavior Therapy and Experimental Psychiatry, 70(September 2020), 101618. https://doi.org/10.1016/j.jbtep.2020.101618

      Craske, M. G., Treanor, M., Conway, C. C., Zbozinek, T., & Vervliet, B. (2014). Maximizing exposure therapy: An inhibitory learning approach. Behaviour Research and Therapy, 58, 10–23. https://doi.org/10.1016/j.brat.2014.04.006

      Doubliez, A., Köster, K., Müntefering, L., Nio, E., Diekmann, N., Thieme, A., Albayrak, B., Nicksirat, S. A., Erdlenbruch, F., Batsikadze, G., Ernst, T. M., Cheng, S., Merz, C. J., & Timmann, D. (2025). Dopaminergic drugs modulate fear extinction-related processes in humans, but effects are mild. Brain Communications, 7(5), fcaf333. https://doi.org/10.1093/braincomms/fcaf333

      Ernst, T. M., Brol, A. E., Gratz, M., Ritter, C., Bingel, U., Schlamann, M., Maderwald, S., Quick, H. H., Merz, C. J., & Timmann, D. (2019). The cerebellum is involved in processing of predictions and prediction errors in a fear conditioning paradigm. ELife, 8, e46831. https://doi.org/10.7554/eLife.46831

      Hermans, D., Craske, M. G., Mineka, S., & Lovibond, P. F. (2006). Extinction in Human Fear Conditioning. Biological Psychiatry, 60(4), 361–368. https://doi.org/10.1016/j.biopsych.2005.10.006

      Kalisch, R., Gerlicher, A. M. V., & Duvarci, S. (2019). A Dopaminergic Basis for Fear Extinction. Trends in Cognitive Sciences, 23(4), 274–277. https://doi.org/10.1016/j.tics.2019.01.013

      Lipp, O. V., Oughton, N., & LeLievre, J. (2003). Evaluative learning in human Pavlovian conditioning: Extinct, but still there? Learning and Motivation, 34(3), 219–239. https://doi.org/10.1016/S0023-9690(03)00011-0

      Lonsdorf, T. B., Menz, M. M., Andreatta, M., Fullana, M. A., Golkar, A., Haaker, J., Heitland, I., Hermann, A., Kuhn, M., Kruse, O., Meir Drexler, S., Meulders, A., Nees, F., Pittig, A., Richter, J., Römer, S., Shiban, Y., Schmitz, A., Straube, B., … Merz, C. J. (2017). Don’t fear ‘fear conditioning’: Methodological considerations for the design and analysis of studies on human fear acquisition, extinction, and return of fear. Neuroscience and Biobehavioral Reviews, 77, 247–285. https://doi.org/10.1016/j.neubiorev.2017.02.026

      Milad, M. R., & Quirk, G. J. (2012). Fear Extinction as a Model for Translational Neuroscience: Ten Years of Progress. Annual Review of Psychology, 63(1), 129–151. https://doi.org/10.1146/annurev.psych.121208.131631

      Salinas-Hernández, X. I., Vogel, P., Betz, S., Kalisch, R., Sigurdsson, T., & Duvarci, S. (2018). Dopamine neurons drive fear extinction learning by signaling the omission of expected aversive outcomes. ELife, 7, e38818. https://doi.org/10.7554/eLife.38818

      Salinas-Hernández, X. I., Zafiri, D., Sigurdsson, T., & Duvarci, S. (2023). Functional architecture of dopamine neurons driving fear extinction learning. Neuron, 111(23), 3854-3870.e5. https://doi.org/10.1016/j.neuron.2023.08.025

      Vervliet, B., Craske, M. G., & Hermans, D. (2013). Fear extinction and relapse: State of the art. Annual Review of Clinical Psychology, 9(March 2013), 215–248. https://doi.org/10.1146/annurev-clinpsy-050212-185542

    1. eLife Assessment

      This valuable work identifies a subpopulation of neurons in the larval zebrafish pallium that responds differentially to varying threat levels, potentially mediating the categorization of negative valence. The evidence supporting these claims is solid; however, the study would be strengthened by more sophisticated analyses of functional imaging results, behavioral confirmation of stimulus valence, and further evidence linking the functionally distinct clusters to their molecular identity. This work will be of interest to systems neuroscientists investigating the circuit-level encoding of emotion and defensive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a map of neurons responding to aversive stimuli in zebrafish and suggests that the regions containing these neurons are homologous to mammalian brain areas involved in aversive processing. Specifically, this study found that neurons in a part of the pallium, the homolog of the amygdala, responded vigorously to strongly noxious and fully looming stimuli, but not to the milder cues. In contrast, neurons in another part of the pallium responded to all of these stimuli. The findings provide valuable insights into the neural mechanisms underlying negative-valence computation in zebrafish.

      Strengths:

      This study performed whole-brain functional imaging using two-photon light-sheet microscopy and identified the activity of individual neurons in awake zebrafish. This technique is highly valuable and will be broadly applicable to future studies aimed at elucidating the neural mechanisms underlying zebrafish behavior at single-neuron resolution.

      Weaknesses:

      Although this study reports neuronal responses to aversive stimuli, it did not directly assess how aversive these stimuli were for zebrafish. In general, studies of this kind quantify the aversiveness of test stimuli by measuring behavioral indices such as avoidance or escape responses. The present study states that "neurons responded vigorously to strongly noxious and fully looming stimuli, but not to milder cues." However, the authors did not provide behavioral evidence demonstrating that the stimuli were indeed aversive or that the so-called milder cues were perceived as less aversive by the animals. Without a behavioral measure of aversiveness, it is difficult to determine whether the reported neural responses reflect negative-valence processing, rather than general sensory salience or stimulus intensity.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to map neurons encoding negative valence at the whole-brain scale in larval zebrafish. Using two-photon light-sheet imaging combined with various aversive stimuli, they visualize and quantify stimulus-evoked neural responses, identify the anatomical locations of responsive neurons, and explore the possibility of genetically accessing Rl neurons that respond preferentially to strongly noxious stimuli.

      Strengths:

      The major strength of this study lies in its use of two-photon light-sheet imaging, which provides a system-level characterization of neuronal response to aversive stimuli. The authors systematically compare multiple classes of aversive stimuli (heat, electric shock, looming, etc.), showing that strongly threatening stimuli converge on a compact neuronal population in the Rl, supporting the robustness of the finding. Finally, the identification of Tiam2a expression in these neurons provides a potential genetic handle for future functional studies.

      Weaknesses:

      The main weakness of the study is the lack of causal evidence supporting the functional role of the identified neurons. Without optogenetic, chemogenetic, or ablation experiments, it is difficult to determine whether these neurons are required for or sufficient to encode negative valence. In addition, the study does not include positive-valence or neutral stimuli controls, making it difficult to distinguish whether the observed neural responses reflect valence per se or more general downstream response such as motor output. Finally, the lack of behavioral readouts limits the ability to directly link the identified neural populations to defensive behaviors.

    4. Reviewer #3 (Public review):

      Overview and Strengths:

      Accurate evaluation of threat levels allows animals to determine whether to escape. The precise mechanism underlying threat evaluation remains unclear. Smith et al. identified a cluster of neurons in the zebrafish rostrolateral dorsal pallium (Rl) that respond differentially to varying levels of negative-valence stimuli.

      This work leverages the small size and optical transparency of the larval zebrafish, using two-photon selective plane illumination microscopy to assay the response of pallial neurons to various negative-valence stimuli. Interestingly, unlike the ventromedial pallium and habenula, which responded to all stimuli tested, neurons in the Rl were activated by a selection of stimuli representing relatively higher levels of threats. By leveraging a zebrafish brain atlas, the authors identified a transgenic line labeling a tiam2a+ cluster of neurons that appears to be the activated population in the Rl. Together, these results demonstrate a subpopulation of pallial neurons that likely categorizes the strength of negative valence in larval zebrafish.

      The primary conclusions of this work are well supported by the data. The identification of a neuronal cluster that may underlie the categorization of threat-associated sensory stimuli is significant. Furthermore, this study generates a high-quality functional imaging dataset using cutting-edge microscopy, setting the foundation for understanding the neuronal encoding of emotions in zebrafish.

      Results from this work set the stage to answer further exciting questions: How do tiam2a+ Rl neurons modulate the activity of the hindbrain escape circuit? What is the functional role of the Rl population inhibited by threat stimuli? Computationally, how does Rl integrate sensory signals and classify threat levels? How does the activity of Rl change in the context of habituation and conditioning? Future work may use more nuanced stimuli and combine new genetic tools, behavioral recording, and circuit-level analysis to systematically reveal how emotions modulate defensive behaviors.

      Weaknesses:

      The impact of this work could be further enhanced by incorporating more sophisticated data analysis and by more clearly anchoring the findings within the known framework of zebrafish defensive behavior.

      (1) The authors performed statistical analyses across six ROIs per experiment in Figures 1E/J, 3E/J, and 6B/D/F. This increases the probability of Type I errors. Applying multiple comparison corrections would mitigate this concern. Given that most stimuli (except for the "IR heating") are non-directional, the authors may consider first testing for the response symmetry following each stimulus and then combining ROIs from the two hemispheres to calculate a single averaged measurement per region per fish for comparisons of regional dF/F.

      (2) I found the topographical mapping of activated and inhibited ROIs very informative. There appear to be two subpopulations of Rl: a posterior-medial population often activated by negative valence stimuli, and an anterior-lateral population that is frequently inhibited. I wonder if it is possible to decode the valence or category of a stimulus based on the topography and response profiles of these neurons? These results would provide additional evidence for the Rl's roles of threat evaluation.

      (3) Findings in this paper, especially differential responses of the Rl to full and partial looming, deserve an expanded discussion. The authors should better anchor these findings to established literature to emphasize their significance in the Discussion. For example, how might this potential categorization mechanism contribute to, or differ from, the mechanisms underlying habituation (Fotowat & Engert, 2023, eLife); what are the possible connections between the pallium and the hindbrain escape circuits that could relay these Rl signals (Kunst et al., 2019, Curr Biol)?

      (4) The authors make conservative claims associating the tiam2a+ cluster with Rl neurons activated by noxious stimuli, and their data support this conclusion. However, this link could be further strengthened by testing whether the tiam2a+ cluster shows differential responses to full vs partial looming. This could be achieved by performing pERK staining following the stimulus paradigm. While future tools may allow for direct functional imaging of this population, I believe such experiments are beyond the scope of this paper.

      (5) Figure 1E/J, Figure 3E/J: Please clarify whether the dashed red vertical lines indicate the onset or the offset of the stimuli. Additionally, different time windows were used for AUC calculations across these experiments; the authors should provide a rationale for these varying windows in the Results or Methods.

    1. eLife Assessment

      This valuable study presents a plastic recurrent spiking network model that spontaneously generates repeating neuronal sequences under unstructured inputs. The authors provide solid evidence that, while the global weight distribution stabilizes, individual synaptic connections undergo constant turnover with strength-dependent timescales, supporting sequence generation. However, the study is purely simulation-based and phenomenological, lacking both a mechanistic explanation for sequence emergence and explicit experimental predictions, and robustness to alternative, more biologically realistic plasticity rules remains to be demonstrated. The work will be of interest to theoretical and experimental neuroscientists working on synaptic plasticity and neural sequence generation.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of this paper is to model the spontaneous emergence of sequences in networks of plastic spiking neurons. By spontaneous, they mean that the inputs have no structure, no sequences, but the network nevertheless generates sequences. To obtain this, they assume several synaptic plasticity and single neuron plasticity rules. The primary findings are that sequences can emerge, that they slowly drift over time, that weights also constantly change over time, but that very strong weights are more stable. The main driver of this result is the plasticity rules assumed.

      Strengths:

      The paper is based on simulations of a relatively large network of conductance based integrate and fire neurons. There are two different pair-based STDP rules assumed for excitatory-to-excitatory synapses and for inhibitory-to-excitatory synapses. In addition, weights are normalized, and there is an adaptation due to plasticity of the spiking threshold. The network is analyzed via simulations and data processing akin to what would be done for physiological data. The simulations are extensive, and the analysis seems rigorous.

      Weaknesses:

      There are several fundamental problems with the paper:

      (1) The plasticity mechanisms used assumed that pair-based STDP is sufficient to account for synaptic plasticity in vivo. This is unrealistic. Various different papers have shown that pair-based STDP models do not account well for experimental data. If this model is a simulation of the visual cortex (unclear), then firing rates can be sufficiently high, such that firing rates are more important than spike times. We already know that firing rates matter due to the original Markram et al paper from 1997. Even if pair-based STDP is used, we already know from Bi and Poo 1998 that there is a weight dependence of synaptic plasticity such that strong weights potentiate less and decay more. This additional assumption alone might completely change the results in this study. We don't really know how to model realistic synaptic plasticity, but we know pair-based STDP is a bad model. Would these results be robust enough for a change in the learning rule, for example, to triplet-based, calcium-based, or voltage-based? Are the results shown even robust enough to include slight modifications to the learning rule, for example, weight dependence of pair-based STDP?

      (2) The first stage of training, in which the network reaches a steady state, is unclear. What type of activity is exhibited in this network? Does most of it arise from the external inputs? What firing rates are obtained? What are the spike statistics? This is important because this activity is responsible for generating the emergent sequences, and also depends (I think) on the plasticity mechanisms. Does the 'spontaneous activity' in the network depend strongly on the external input? Figure 1E is where we see a raster plot, but we see only neurons within a sequence, and it seems neurons within the sequence fire almost only once. Before showing sequences that more general structure of the spiking activity and how it evolves should be explained and quantified.

      (3) Do these sequences really emerge without structured inputs? Is there any evidence to suggest that such sequences emerge without a structured input? If yes, please cite it. It makes sense that it would, because the time scale of these sequences is much faster than the sensory or behavioral time scale. However, experimental evidence to support this will make the paper much more interesting.

      (4) This paper is a phenomenological paper. It does not really say what these sequences might be good for, except for a cite or two, and it does not model any specific experiment. There is a medium here (a plastic spiking network) which generates a phenomenon (sequences). It also generates other measurable phenomena, such as connectivity motifs. Such motifs have been quantified in animals. It would be natural to compare the motif statistics found here to motifs characterized experimentally. This would make these results more substantial.

      (5) There are implicit predictions in the work. For example, about the stability of strong vs. weak efficacies or the stability of different motifs. Such predictions should be made more explicit.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates how a combination of spike-timing-dependent plasticity rules in recurrent spiking networks leads to the spontaneous emergence of repeating neuronal sequences. The authors show that despite the weight distribution reaching a steady state, individual synaptic connections undergo constant turnover with timescales that depend on connection strength. The plasticity rules promote fan-in/out connectivity motifs that appear to support sequence generation.

      Strengths:

      The question addressed is important and biologically relevant. The most interesting finding of the paper is the coexistence of a stable weight distribution with constant turnover of individual synaptic connections.The simulations seem to be carefully executed.

      Weaknesses:

      The paper does not make a sufficient attempt to explain why the observed phenomena arise under the specific learning rules employed. There is no theoretical reduction, no analytical argument, and no mechanistic intuition. As it stands, this reads as a descriptive simulation study.

      It is never made clear which results reflect robust qualitative phenomena and which are specific to the particular hyperparameter choices of these simulations. Specific percentages and parameter values are reported throughout the main text without justification of their importance or generality.

      The finding that sequence composition undergoes continual turnover while the global weight distribution remains stable is interesting, but the authors should more carefully situate this result within the existing theoretical literature on synaptic drift and sequence stability under ongoing plasticity. Several modeling papers have addressed related phenomena, and the novelty of the present contribution relative to this body of work is not clearly established.

    4. Reviewer #3 (Public review):

      Summary:

      This modelling study connects synaptic plasticity, connectivity motifs, and representational drift. The authors combine excitatory and inhibitory STDP with weight normalization and intrinsic plasticity in a recurrent spiking network of AdEx neurons. This combination generates heavy-tailed synaptic weight distributions and supports repeating spike sequences under both unstructured and structured inputs. While global network statistics stabilize over time, individual synapses continue to change, creating a form of drift. Structured inputs further stabilize sequences, yet the network retains flexibility to learn new patterns.

      Strengths:

      (1) Multi-scale turnover analysis:

      The authors study the evolution of individual synapses, 3-neuron motifs, follower neurons, and entire neuronal sequences, revealing distinct turnover timescales.

      (2) Fan-in/out motif analysis:

      A specific connectivity motif (fan-in/out) is shown to be over-represented in the network and preferentially stabilised by the plasticity rules compared to other possible motifs. This generates interesting insights and testable predictions.

      (3) Connection to representational drift:

      The connection of ongoing synaptic plasticity to drift is timely and interesting, reproducing observations of macro-level stability and synapse-level turnover with a relatively simple mechanism.

      (4) Rigour and thoroughness:

      The overall quality of the numerical experiments performed in this study is high, with extensive supplementary material performing various controls to solidify the claims.

      Weaknesses:

      (1) Limited connection to network function:

      Sequence detection relies on a rather artificial protocol (forced spiking of a single neuron 1,000 times), which I suspect mostly tests whether the lognormal tail of the weight distribution can propagate activity. This risks being circular. I think performing the same sequence analysis on a random network/a network with the same weight distribution but shuffled would help understand what comes from a generic heavy-tailed weight distribution and the particular weights potentiated by the plasticity rules used here.

      The network, which would classically be evaluated as a memory network, is not assessed on this aspect. While the authors do not overclaim, this limits the impact.

      Relatedly, the relearning experiment (Figure 5G) shows catastrophic forgetting. This is acknowledged in the discussion, but the suggested solutions (alternating patterns, plastic readout) are speculative without supporting simulations. This limits the applicability of the model as a memory model or, more broadly, as a model of a brain region/function.

      Additionally, in the sequence learning experiments with structured input, the ability to learn seems tied to the very specific timescale of pattern presentation (~10 ms per pattern, comparable to the STDP kernel time constants), arguably faster than the timescale of external stimuli. The stability of sequences may also owe more to the normalization scheme than to STDP per se.

      (2) Novelty claims and positioning within the literature:

      On page 16, the authors write: "Our results demonstrate that spiking sequences can be generated in randomly connected networks trained by synaptic plasticity even under unstructured inputs, which supports STDP being the main actor, while stabilizing mechanisms such as weight normalization and intrinsic plasticity play a complementary role." (c1).

      Several aspects of this work are less novel than the presentation suggests:

      (a) The fact that STDP can create sequence-like dynamics/asymmetric connectivity matrices in recurrent networks has been studied theoretically [1,2] and in simulations [3,4,5]. While [3] is cited, the manuscript underplays the similarity. [4] (uncited) considers e+iSTDP with a different homeostatic term to represent sequential stimuli in large recurrent spiking networks. [5] (uncited) also considers a recurrent spiking network with several STDP-like rules and shows that many combinations can store and recall sequential inputs.

      (b) Lognormal weight distributions emerging from STDP-based plasticity and the autonomous emergence of connectivity structures have extensive literature. While many of these articles are already cited in the manuscript, I fail to see what this work brings to this matter compared to existing work (particularly [6]).

      (c) Several published works challenge the manuscript's implicit claim (c1) that sequences require their particular combination of rules. Many other plasticity mechanisms can create sequences [3,4,5,7,8,9]. Some interpretations may also need to be dialed down: [10] (uncited) showed that sequences can be stored and retrieved using EI and IE plasticity alone. iSTDP may be doing more computational work than acknowledged, which complicates the interpretation of which mechanisms are truly driving the phenomena.

      Overall, most of the relevant work is already cited in the manuscript, but not necessarily acknowledged adequately.

      (3) Justification of plasticity model/robustness analysis:

      The parameters in Tables 1 and 2 are quite specific without strong justification (for instance, different sparsity values for each connection type and specific normalization factors). Without parameter sweeps, it is difficult to know whether the key findings are robust or overfit to this particular network configuration. Given the number of parameters, exhaustive sweeps are out of question, and the argument made previously would still prevent the rule combination proposed from being considered as more than one possible mechanism for sequence generation among many others. However, this deserves to be acknowledged, and potentially a few sweeps to be run (e.g., over LTP/LTD ratio, normalization threshold, and network size). I don't think that Figure S12, which shows that removing any component of the model causes it to break down in some way, is enough to cover alternative plasticity rules.

      A related concern is that the network is small by current standards (1,200E + 240I neurons), especially with sparse connectivity (6-20%). Small networks with few connections are susceptible to synchronization (other studies typically consider networks of at least 10k neurons). The authors should discuss whether the phenomena they observe would persist at larger scales and under more biologically realistic connectivity. Specifically, are the intrinsic and normalization plasticity terms as crucial in this case?

      (4) Fan-in/out motif evidence is correlational:

      The evidence linking the fan-in/out motif to sequence stability appears to be correlational. Properly establishing causality would require targeted ablations or rewiring of fan-in/out connections. While designing a clean causal intervention may be difficult, the correlational nature of the evidence should be stated explicitly.

      Conclusion:

      To summarize, the manuscript would benefit from:

      (1) Reframing the contribution:

      Multi-scale turnover analysis and the discussion around representational drift as the core novelties. I would reposition sequence emergence and lognormal distributions as reproducing known results under a specific plasticity model and analysis method.

      (2) Acknowledging that many rule combinations could produce equivalent outcomes, and not suggesting that the combination chosen here is special.

      (3) Adding parameter sensitivity analysis or, at a minimum, discussing robustness.

      References:

      [1] Kempter, Gerstner and van Hemmen, Hebbian learning and spiking neurons, 1999, PRE

      [2] Ocker, Litwin-Kumar and Doiron, Self-organization of microcircuits in networks of spiking neurons with plastic synapses, 2015, plos CB<br /> (Theoretical account of STDP in spiking networks and motifs, though it only looks at 2-synapse motifs (not fan-in/fan-out)).

      [3] Fiete et al., Spike-Time-Dependent Plasticity and Heterosynaptic Competition Organize Networks to Produce Long Scale-Free Sequences of Neural Activity, 2010, Neuron

      [4] Duarte and Morrison, Dynamic stability of sequential stimulus representations in adapting neuronal networks, 2014, Frontiers in Comp Neuro

      [5] Confavreux et al., Memory by a thousand rules: Automated discovery of functional multi-type plasticity rules reveals variety and degeneracy at the heart of learning, 2025, bioRxiv

      [6] Zheng, Dimitrakakis and Triesch , Network Self-Organization Explains the Statistics and Dynamics of Synaptic Connection Strengths in Cortex, 2013, plos CB

      [7] Zheng and Triesch, Robust development of synfire chains from multiple plasticity mechanisms, 2014, Front Comp Neuro

      [8] Ravid Tannenbaum and Burak, Shaping Neural Circuits by High Order Synaptic Interactions, 2016, plos CB

      [9] Bell, Duffy, and Fairhall, Discovering plasticity rules that organize and maintain neural circuits, 2024, NeurIPS

      [10] Gong and Brunel, Inhibitory Plasticity Enhances Sequence Storage Capacity and Retrieval Robustness, 2024, bioRxiv

    1. eLife Assessment

      In light of the diverse functions associated with the Dorsal Raphe Nucleus across vertebrate species, this important study presents findings on the role of serotonin in promoting behavioral quiescence through the regulation of neuromotor populations. Combining optogenetics with brain-wide activity analyses, the study provides convincing evidence of interest to researchers in neuromodulation and translational medicine fields.

    2. Reviewer #1 (Public review):

      The wide-ranging serotonergic projections emerging from the Dorsal Raphe nucleus (DRN) are suggestive of a central role in regulating brain-wide activity and behavioural states. DRN activity has been associated with diverse functions, ranging from mood, motivation and pain regulation to sleep and cognitive flexibility. Its far-reaching connectivity made it challenging to assess the brain-wide effect of its activation, especially during behaviour.

      The present study by Qi et al. addresses these challenges by combining state-of-the-art tracking microscopy with the whole-brain accessibility of the larval zebrafish model. To investigate the effect of DRN activation, the authors leveraged the Tg(tph2:ChrimsonR) line to optogenetically activate tph2-positive neurons in the DRN, while monitoring changes in brain-wide activity, locomotion and auditory-stimuli evoked responses.

      Optogenetic activation had a suppressing effect on locomotion, which the authors distinguished from inducing sleep by the maintenance of posture and its sleep disturbing effect of nighttime stimulations. Further, the authors report a distinct effect of DRN activation on motor-related, but not auditory-related neuronal subspaces, identified by demixed principal component analysis.

      In addition, rather than affecting all motor-correlated neurons similarly, tph2+ DRN-mediated suppression focused on neurons encoding high-amplitude or turning motion.

      In summary, the work of Qi et al. provides solid evidence for a predominant role of the DRN in wake-state motor suppression by aptly combining the vast data-acquisition possibilities of the larval zebrafish model with computational methods to extract relevant information.

      The brain-wide scope of the analysis is a key strength, reducing bias, confirming the involvement of known motor and auditory regions, and providing a valuable dataset for future analyses.

      While the results well support the conclusion of the authors, certain biological and technical aspects demand discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The authors examine the effects of activating the dorsal raphe nucleus serotonergic system using a combination of calcium imaging and optogenetics in freely moving larval zebrafish. Their findings show that optogenetic stimulation induces a state of behavioral quiescence.

      They further investigate whether this state corresponds to sleep or reduced motor activity. Analyses of posture and sleep-related paradigms indicate that serotonergic activation primarily suppresses motor output rather than promoting sleep. Notably, this suppression appears to be bout type-dependent, with stronger effects on neurons associated with larger tail amplitudes and turning angles.

      In addition, auditory stimulation experiments reveal no significant impact of serotonin on sound encoding.

      Strengths:

      The study combines advanced experimental techniques with state-of-the-art analytical methods, enabling precise and compelling insights into the role of serotonergic modulation. The experiments and analyses are well aligned with the questions being addressed, and the results appear robust and reliable.

      Moreover, the implementation of experiments that combine calcium imaging and optogenetics in freely moving animals is technically challenging and appears well justified in the context of the research questions.

      Weaknesses:

      While the analytical techniques employed are sophisticated and appear to be appropriately applied, their presentation makes the manuscript difficult to follow. Although the explanations are provided in the Methods section, including more guidance in the main text, such as how to interpret each analytical approach and what outcomes would be expected under different scenarios, would help readers who are less familiar with these techniques.

      Providing this context would better guide the reader in navigating the figures, broaden the accessibility of the work, and ultimately increase its impact.

      While the authors discuss different quiescent states mediated by serotonin reported in previous studies, their interpretation is limited to stating that "a common feature shared by these distinct behavioral states is a pronounced reduction in movement," and consequently proposing that activation of dorsal raphe nucleus is not sufficient to specify a particular behavioral state, but rather plays a primary role in driving motor suppression.

      In my view, a more thorough attempt to determine whether the observed state corresponds to any of the previously described forms of quiescence, or represents a subset or variant of them, would strengthen the manuscript. This would help better integrate the findings with the existing literature.

      For example, given that the authors have access to whole-brain activity data, it would be valuable to examine and discuss whether there are shared patterns of activation with previously reported quiescent states.

      The manuscript largely avoids discussing the mechanisms underlying the observed motor suppression. For instance, is this effect driven directly by serotonin release onto target neurons? Is it mediated by glial activity, as suggested in other studies? Are additional neuromodulatory systems being recruited?

      While addressing these questions may require substantial further work, potentially beyond the scope of the present study, the availability of whole-brain data provides an opportunity to at least explore or discuss these possibilities. In particular, it would be interesting to examine the recruitment of regions not directly stimulated but known to be associated with other neuromodulatory systems or promoting glial activation (e.g., the locus coeruleus).

    1. eLife Assessment

      This manuscript presents important findings that challenge traditional models of speech processing by demonstrating that theta-gamma phase-amplitude coupling in the auditory cortex is primarily a stimulus-driven alignment to external acoustic structures rather than an intrinsic neural oscillatory mechanism. The evidence supporting these claims is convincing, grounded in a robust cross-linguistic acoustic analysis and high-fidelity, time-resolved intracranial recordings.

    2. Reviewer #1 (Public review):

      Summary:

      This article investigates the application of commonly employed analytic methods in electrophysiological neuroscience to the speech envelope taken from 17 different languages' audio corpora. The findings indicate that features observed in speech-brain tracking responses, specifically theta and gamma oscillations, as well as their phase-amplitude coupling, are actually present within the speech envelope itself. This suggests that the neural data recorded in response to speech primarily reflects an evoked response to the temporal statistical properties of the envelope, rather than an inherent neural mechanism. Data from 18 individuals with epilepsy listening to French speech further support this interpretation: theta and gamma oscillations, along with their phase-amplitude coupling, are absent at rest and are linearly driven by the acoustic envelope during speech perception.

      Strengths:

      I find these results very interesting and convincing, with a strong take-home message: we should exercise caution when interpreting observed theta/gamma activity and the associated phase-amplitude coupling during speech comprehension tasks.

      Weaknesses:

      I mostly have comments on clarifications regarding the methods, specifically on the criteria for language exclusion, and on the statistical testing and reporting.

      (1) Clarification is needed regarding the rationale for the number of languages analysed: initially, 17 languages were considered, six were excluded due to the absence of PAC in the high gamma range, yet the analysis was ultimately conducted on only nine languages, not eleven. Could you please explain this discrepancy?

      (2) Considering the six languages that did not exhibit any statistically significant high-frequency PAC, do you have potential reasons for this result? Might it be related to the fundamental frequency (F0) of the speakers' voices? If six languages out of seventeen do not show PAC, can we argue that this feature is universal across languages?

      (3) How is inter-subject variability addressed within the SEEG analysis? The authors report the percentage of SEEG independent components showing significant effects in power spectral changes, PAC, and other measures, but it is unclear whether these components are consistent across participants or whether only a few participants drive the effect. It would be helpful to report how many participants are retained for each selection of SEEG-ICs in the article. Currently, the statistical testing of the SEEG-ICs also appears to assume independent samples. It would be helpful to include group-level statistical tests across subjects, for instance by performing mixed-effects models and including participant as a random factor.

    3. Reviewer #2 (Public review):

      Summary:

      This paper nicely demonstrates that "speech tracking" in the auditory cortex extends all the way up to 100Hz-150Hz. Specifically, the study asks whether the fluctuations in sound amplitude found in speech at various time scales relate to fluctuations found in similar time scales in intracranial recordings in auditory brain areas. First, it analyzes amplitude fluctuations in speech of 17 different languages, and characterizes fluctuations due to syllabic rate (2-6Hz), vocalic features (30-50 Hz), and fundamental frequency (100-150 Hz, in male speakers). It then analyzes whether neural activity occurs while listening to male and female speakers in French. By measuring changes in power spectrum relative to rest, it links the sound amplitude fluctuations to fluctuations in neural activity in the same frequency bands, referring to them as "theta", "low-gamma", and "high-gamma". Using Grange "causality," it clearly shows that the neural fluctuations can be predicted linearly from the sound fluctuations. Using a cross-frequency coupling measure, they further show that, in the neural dynamic, high-gamma fluctuations precede theta fluctuations.

      Strengths:

      (1) Analysis of neural activity (Figure 2 is a very compelling account of how theta, low, and high gamma observed in neural recordings closely follow the properties of the acoustic speech signal itself.

      (2) This includes phase amplitude coupling, a property that I had not previously seen described for the speech signal itself, and is here nicely demonstrated in Figure 1.

      (3) The Grange "causality" analysis makes a compelling case that neural fluctuations in these frequency bands are driven by the stimulus itself.

      (4) The finding in Figure 4 that female fundamental emerges at half the frequency in the neural activity is, to my knowledge, an entirely novel observation, not just in speech but in amplitude modulated sounds in general. This non-linear phenomenon is very interesting and prompts a host of interesting questions for future research: Does this happen only for voiced speech, does it depend on the harmonic stack of speech, or can it be produced with a single AM frequency? Are there preferred frequencies for this phenomenon?

      (5) The cross-frequency coupling measure shows a number of directed effects in the neural signal which seem to counter the predominant view in neuroscience, namely, that the phase of the slower fluctuations "organize" or "drive" the faster fluctuations seen in power, e.g. theta→gamma coupling, which here is seen to be reversed as gamma→ theta coupling, and this is not a property of sound itself. This, too, should lead to a number of follow-up studies (although there are some potential confounds here).

      Weaknesses:

      (1) The claim that different frequency bands are processed in different locations, referred to in the abstract as "multiplexing" is less well supported. The neural analysis is performed on independent components that are spatially distributed, making this claim less transparent than it could be, with other, more direct ways of treating electrode location, such as bipolar referencing.

      (2) The writing in the Introduction and Results section obscures the source of sound amplitude fluctuations at different timescales. Instead, it treats these fluctuations as some sort of discovery. This is strange because the abstract and discussions are fairly accurate on this point - namely, they are all due to well-known properties of speech. The descriptions are accurate, although I would put it slightly differently: fluctuations below 6Hz are due to varying length of sentences and words, 25Hz-50Hz are well-established stationary times of the vocal tract, and 100-150Hz are the vibration of the vocal cords in male speakers.

      (3) The problem of guiding the analysis of sound by notions from neural signals is most glaring when they restrict their analysis to less than 150Hz, which leaves out female-voiced speech.

      (4) Along with this, there is a heavy emphasis on notions of "rhythms" and "oscillations" when clearly, aside from the vocal cords, there is no evidence for rhythmic fluctuations. Any reasonable definition of a rhythm would need at least 2 or 3 cycles of a repeated pattern. A spectral "peak" for the sound envelope is shown at 5Hz. But this is not indicative of a regular rhythm. Instead, the peak appears to be an artifact of displaying power per octave rather than power spectral density. A peak in a power per octave is not a reliable indicator of a coherent oscillation, and the speech envelope does not exhibit a clear 5Hz rhythm. Unfortunately, prior literature has not been clear on this. It would be more accurate if the word "rhythm" were replaced with "fluctuation" and/or "activity" for the case of speech envelope and neural activity, respectively.

      (5) The Introduction also omits the literature on neural responses to amplitude-modulated sounds that go up at least to 200Hz and more. So the findings here on "high-gamma" are well in line with prior literature.

      (6) The fact that neural analysis was cut off at 150Hz to me is a missed opportunity to test if neural speech tracking goes all the way up to 200Hz of the typical female fundamental.

      (7) The gamma→theta effects reported here could be confounded by a simple longer delay in the analysis of theta. In fact, Figure S5 confirms that delay. It is unclear whether the CFD metric captures anything more than a temporal delay between the two signals. The term "functionally interconnected" in the abstract is a bit of a stretch; it may be essentially delayed correlation.

      (8) There is a minor concern with the claim that low-gamma drives theta amplitude. While statistics on this are reported, the corresponding figure may be suggesting an alpha-harmonic instead of theta (Figure 5c).

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates whether the theta-gamma phase-amplitude coupling in the human auditory cortex serves as an intrinsically generated neural mechanism for hierarchically parsing speech or not. By analyzing speech corpora across 17 languages alongside human intracranial EEG recordings, the authors demonstrate that these nested oscillatory dynamics are actually inherent, robust acoustic properties embedded within the speech envelope itself. Consequently, they claim that rather than generating parsing windows internally, the early auditory cortex acts as a temporal demultiplexer that segregates syllabic, vocalic, and pitch features into distinct, stimulus-driven neural channels. Furthermore, the study presents evidence for a reversed functional directionality wherein fast-varying gamma activity drives the phase alignment of slower theta rhythms, fundamentally reframing auditory PAC as a stimulus-evoked alignment to a highly structured external signal rather than an endogenous cognitive parsing tool.

      Strengths:

      (1) The authors demonstrated robust theta-gamma acoustic structure across languages. They analyzed the acoustic speech envelope across 17 typologically distinct languages. This establishes that the nested theta-gamma acoustic structure is a universal feature of human speech, rather than an artifact of one language's specific phonology.

      (2) The use of time-resolved, high-SNR intracranial recordings is a critical strength of this study. This approach provides the precise spatiotemporal fidelity required to confidently separate and delineate multiplexed high-frequency dynamics, particularly the low- and high-gamma bands, that are essential for accurate speech decoding but are typically attenuated or lost in non-invasive scalp recordings.

      (3) The authors move beyond standard correlational PAC metrics by employing a suite of converging analyses, including the isolation of true oscillations from aperiodic noise and the directional index. Together, these metrics demonstrate that auditory PAC is a stimulus-evoked alignment to a highly structured external speech signal, rather than an intrinsically generated top-down parsing mechanism.

      Weaknesses:

      (1) A major methodological concern is the use of ICA across SEEG electrode shafts to define distinct neural sources (SEEG-ICs). SEEG electrodes traverse complex macroanatomy, including multiple cortical layers, sulcal banks, and white matter. By constructing components derived from weights across the entire electrode, and subsequently localizing each component solely to the contact with the maximal contribution, the authors risk generating biologically implausible signals. Such an approach potentially mixes true localized cortical gray matter activity with deep structure or white matter signals. Given that a central claim of this manuscript is the spatial and functional segregation of theta and gamma neural populations, the authors could consider further validating these core findings (such as the gamma-to-theta directionality) using single-channel or bipolar-referenced data.

      (2) Another methodological concern is the use of GC to evaluate the directional causality between speech and neural signal. As noted in Bastos & Schoffelen (2015) and indeed acknowledged by the authors' own citation of Nolte et al. (2010), Granger Causality is highly sensitive to SNR imbalances and filtering artifacts. Given the inherent SNR disparity between a cleanly extracted acoustic envelope and noisy SEEG data, coupled with the known distortions introduced by distinct filtering pipelines (Barnett & Seth, 2011), the GC results may reflect methodological artifacts rather than true physiological driving.

      (3) The third concern is the study's exclusive reliance on linear metrics applied to the envelopes of band-filtered speech and neural signals, e.g., linear Granger Causality and cross-correlations. The human auditory system is an inherently non-linear dynamical system. Complex acoustic features, such as rapid spectrotemporal transitions or dynamic pitch trajectories, often drive non-linear neural responses and complex phase-locking behaviors. While the linear models provide strong interpretable results, by restricting their connectivity and directionality metrics to linear autoregressive models, the authors may be missing substantial non-linear interactions, or conversely, forcing a linear fit onto non-linear data, which can distort estimations of causality and temporal lags. The authors should consider explicitly addressing this limitation in their discussion. Ideally, they should validate their core directional claims on a subset of the data using an information-theoretic, non-linear metric (e.g., Transfer Entropy or Mutual Information), or apply linear methods to nonlinearly abstracted features (e.g., phonemic, syllabic, intonational-level features), to ensure their linear assumptions are not masking or misrepresenting the true underlying dynamics.

    1. eLife Assessment

      This is a potentially important study comparing infants (8 months) and adults with respect to rhythmic EEG response properties during periodic and aperiodic visual stimulation. The results provide solid evidence for a ~4 Hz EEG response in infants that emerges independently of stimulation frequency. At this stage, additional work will be required to conclusively establish that this theta-band effect reflects genuine neural resonance rather than oculomotor processes.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report results from an EEG study investigating neural oscillations in 8-month-old infants, as well as an adult control group. Participants were presented with cartoon figures flickering at different frequencies, as well as a broadband condition. While adults showed the well-known dominant response at 10 Hz, infants showed dominance resonance at 4 Hz, irrespective of stimulation frequency. The authors interpret this finding as evidence for the fundamental role of 4 Hz oscillations in early development and discuss two conflicting theories regarding the underlying functionality.

      Strengths:

      Overall, this is a very well-designed and rigorous study, and the results significantly add to our understanding of a very fundamental aspect of early brain activity. The study is embedded in a coherent theoretical framework, and the authors discuss possible implications and next steps with great clarity.

      Weaknesses:

      I see relatively few weaknesses in this paper. It does not statistically compare infant and adult responses, which would add to the argument that infant responses actually differ from adult ones, but I don't think this is necessary at this point for the authors' argument.

      In contrast, I actually like about the paper that the authors had a very clear vision of what they wanted to look at - 4 Hz oscillation responses in 8-month-olds - and this is exactly what they did. Yes, this does not answer all questions one might have, especially about the function of 4-Hz-oscillations in infants, but it goes a long way in characterising properties in 4 Hz oscillations, which provides the starting point for several potential future lines of research.

    3. Reviewer #2 (Public review):

      Summary:

      This study combines EEG with frequency-tagging and broadband stimulation paradigms to investigate the developmental precursors of brain rhythms in 8-month-old human infants. The manuscript employs state-of-the-art methods, focusing on theta and alpha rhythms to assess their functional significance in visual information processing.

      By evaluating responses to visual stimulation at different frequencies and broadband stimulation presented simultaneously with sounds, the authors report a stimulation frequency-independent response at ~4 Hz. They interpret this as the precursor of the adult alpha rhythm involved in perceptual echo mechanisms. However, I have a number of questions regarding the hypotheses, experimental framework, and analytical approach that need to be addressed before confirming the conclusions.

      Strengths:

      (1) The analyses are innovative, and the frequency-tagging paradigm is particularly well-suited for studying challenging populations with short protocols.

      (2) The sample size is adequate.

      Weaknesses:

      There is a gap between the hypotheses and the experimental paradigm, as well as between the hypotheses and the analytical choices. These gaps could alter the interpretation of the findings and thus require clarification (or perhaps a reformulation of the theoretical framework).

      I am not convinced that the conclusion - that the theta rhythm is the functional precursor of the alpha rhythm in the infant visual system - holds without addressing the following questions.

      In brief, my specific concerns are the following:

      (1) Gap Between Hypotheses and Experimental Paradigm:

      The experimental paradigm involves the simultaneous presentation of sound and image, i.e., cross-modal sensory information, which contrasts with the manuscript's theoretical framework and conclusions, all of which are grounded in visual information processing. Previous work has shown that preverbal infants spontaneously engage in cross-modal associative learning in such audiovisual paradigms (e.g., Kabdebon et al., 2019). This raises the question of whether the paradigm taps into different mechanisms - such as associative learning - rather than those hypothesized, and whether these mechanisms might better explain the observed 4 Hz response. Associative learning mechanisms are particularly relevant to theta rhythm, involving hippocampal learning and the engagement of wider networks, including frontal areas.

      Given this cross-modal design, I question whether it might alter the interpretation of the paradigm and the conclusions drawn. The current framing of the manuscript suggests that theta/4 Hz is the functional equivalent of the alpha rhythm for visual processing in the 8-month-old brain. However, the use of multisensory input complicates this conclusion for the visual domain and the parallel to adult mechanisms.

      Kabdebon, C., & Dehaene-Lambertz, G. (2019). Symbolic labeling in 5-month-old human infants. Proceedings of the National Academy of Sciences, 116(12), 5805-5810.

      (2) Analytical Focus - Gap Between Hypothesis and Analysis Choices:

      The link between the literature described in the introduction and the hypothesis of a 4 Hz inherent rhythm in the visual system remains unclear. This puzzles me as to why the analyses focused on 4 Hz and a control band that is not adapted to the infant population. The focus of the analyses on 4 Hz (and the control band analyses) overlooks the critical frequency range (~6-8 Hz), which other studies have suggested may serve as proxies for the adult alpha rhythm. This omission does not align with the hypotheses regarding the role of the alpha rhythm in visual information processing.

      The introduction discusses both alpha rhythm and its significance in perceptual echo phenomena, and theta rhythm and its role in mnemonic function, but these remain as separate phenomena. While the paradigm aims to assess perceptual echo phenomena in infants, one would expect the hypothesis to relate to precursors of the alpha rhythm in infancy (slower frequencies, yet related to alpha, ~6 Hz; Stroganova et al., 1999). However, the authors hypothesize that theta rhythm (4 Hz) is a precursor of the alpha rhythm in infancy: "Given the prominence of the theta rhythm in infancy, we expected the presence of a 4 Hz theta response and resonant activity in the infant visual system upon periodic stimulation and broadband visual input, respectively."

      Why did the authors not study the 6-9 Hz frequency range, which previous work suggests may serve as a proxy for alpha in infants? Currently, the analyses are restricted to the theta range (i.e., 4 Hz) and a control band (adult-classical alpha range [8-14 Hz]), but [8-14 Hz] is not adapted to the infant population. At this age, prior work has reported ~6 Hz as the age-adapted range corresponding to alpha. It would be more appropriate to investigate this range. I can see some trace of this in Figure 2a, but perhaps this is weaker compared to the 4 Hz stimulation due to the cross-modal nature of the paradigm.

      Stroganova, T. A., Orekhova, E. V., & Posikera, I. N. (1999). EEG alpha rhythm in infants. Clinical Neurophysiology, 110(6), 997-1012.

      In the adult results, we also see similar ("two types of") responses: the main response at 8 Hz, which to me is the upper band of the theta rhythm (related to cross-modal learning), and traces around 10 Hz, which are more in line with perceptual echo mechanisms. The cited literature in adults (VanRullen & Macdonald, 2012), on which the authors base their framework and analysis, indicates a response at 10 Hz (not 8 Hz). This supports the idea that the 8 Hz response observed in this work might be related to the cross-modal presentation of stimuli. The authors could evaluate this more easily through a control group of adults with an unimodal (visual-only) presentation of stimuli.

      (3) Methodological Approach and Clarity:

      The methodological approach is not sufficiently detailed, which is crucial for reproducibility and wider contribution, especially given the difficulties in studying infants. Key points requiring clarification include preprocessing, choice of electrode clusters, and statistical details.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to characterize the intrinsic temporal dynamics of the infant visual system by examining how it responds to rhythmic visual stimulation. Using EEG in 8-month-old infants, they present visual stimuli that flicker at different periodic frequencies as well as broadband (aperiodic) luminance sequences to probe resonance properties of the visual system. The central goal is to determine whether the infant brain exhibits a characteristic oscillatory response independent of the external stimulation frequency, analogous to the well-known alpha (~10 Hz) resonance of the adult visual system. The results are then compared with data from a small adult sample to assess whether the dominant processing rhythm of the visual system shifts across development.

      Strengths:

      This manuscript presents a compelling and carefully executed study with intriguing findings, and I greatly enjoyed reading it. Several strengths deserve particular mention:

      (1) Clear and focused research approach. The study addresses a well-defined question regarding the intrinsic rhythmic dynamics of the infant visual system and applies an elegant experimental paradigm to probe these dynamics directly.

      (2) Well-designed parametric stimulation paradigm. The use of rhythmic visual stimulation across multiple frequencies (2-30 Hz), combined with broadband stimulation, provides a systematic way to characterize resonance properties of the visual system. This parametric approach allows the authors to clearly visualize the relationship between stimulation frequency and neural response, making the key effects easy to grasp.

      (3) Strong statistical power in the infant sample. The relatively large infant sample (N = 42) is a major strength, particularly given the challenges of infant EEG research. This sample size provides sufficient power to support the conclusions about the robustness of the ~4 Hz response in infants.

      (4) Converging analytical approaches. The authors combine periodic stimulation analysis with impulse-response-function (IRF) analyses of broadband stimulation, which provides complementary evidence for the presence of a ~4 Hz resonance in the infant visual system. This convergence strengthens the interpretation of the results.

      (5) Direct developmental comparison. Although the adult sample is small, including adults in the same paradigm provides a useful benchmark showing the expected alpha-band response (~8-9 Hz), thereby contextualizing the infant findings within a developmental framework.

      Weaknesses:

      (1) Potential oculomotor contribution to the frontal 4 Hz effect. My main concern relates to the interpretation of the prominent ~4 Hz response in infants, particularly at frontal electrodes. The frequency range is close to what might be expected for oculomotor activity such as microsaccades, and the scalp distribution appears suggestive of such a contribution. Notably, the topography of the 4 Hz response differs substantially from the topography of the harmonic responses (Figure 2B), which show the expected occipital dominance. The latter is more clearly visual, whereas the former is more complex, definitely going beyond visual responses. This should be considered more in the discussion.

      (2) Differences in topography between periodic and IRF effects. The spatial distribution of the 4 Hz response during periodic stimulation also appears to differ from the topography of the 4 Hz impulse response function (IRF; Figure 2B vs 3D). The IRF response appears not really "visual" in its spatial distribution, as compared to, e.g. the harmonic responses in 2B. This difference could indicate distinct underlying generators, but the implications of this discrepancy are not discussed in detail.

      (3) Strength of the interpretation of neural resonance. Taken together, these observations make it difficult to determine conclusively whether the observed 4 Hz activity reflects genuine neural resonance of the visual system or potentially other processes (e.g., oculomotor dynamics). While the current findings remain interesting under either interpretation, the manuscript tends to favor the neural resonance account quite strongly without fully addressing alternative explanations.

      (4) Relation to known developmental shifts in resting-state oscillations. The dominance of lower-frequency rhythms (theta range) in infancy is well documented in the resting-state EEG literature. Although this point is briefly mentioned in the discussion, it would be interesting to relate the current findings more directly to this literature. For example, it would be informative to know whether peak frequencies observed here align with resting-state theta peaks in infants and whether similar spatial distributions are observed.

      (5) Limited follow-up of the proposed theoretical accounts. The discussion introduces both mnemonic and inhibition accounts for infant theta activity. However, these frameworks are not fully developed in relation to the present data. In particular, the mnemonic account might generate testable predictions within the current dataset, for example, whether theta responses change over time with repeated stimulus exposure or learning.

      (6) Characterization of the adult alpha response. A minor point concerns the characterization of the adult resonance frequency. The manuscript often refers to a 10 Hz alpha resonance, whereas the data presented here show a peak around ~8 Hz (Figure 5A). In that frequency range, that is a lot. Also, there seems to be some variability, such that for the topography, the authors use the "individual alpha frequency". It would be interesting to see the distribution of peak frequencies across participants to appreciate the actual range. Interestingly, the spatial distribution of the alpha response also appears quite similar to the infant 4 Hz effect (Figure 5B) and differs from the harmonic responses, which may deserve further discussion. A comparison with resting-state alpha characteristics could also be informative here (e.g., does the peak IAF during visual stimulation relate to IAF recorded at "rest").

    1. eLife Assessment

      Shin et al present important new observations regarding novel REM-specific cortical high-frequency oscillations. The evidence demonstrating the presence of a novel rhythm is convincing. However, the data presented is incomplete to demonstrate claims of a) brain-state-specific effects of these events, b) clear structured reactivation, and c) the specific degree of linkage to memory consolidation.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      Shin et al deepen our understanding of high-frequency oscillations in the frontal cortex during REM in a manner that sheds important light on the roles of these events. In particular, they reveal that cortical HFOs are modulated by theta oscillations, occur in chains and recruit cortical neuronal activation patterns in a manner that is distinct from other high-frequency events during non-REM or in the hippocampus. They also show that these events occur during increased oscillatory cross-talk between hippocampus and cortex and may protect cortical neurons from downregulation of firing during sleep. Overall, this is important work with several novel observations pointing towards an important role for these events that will become increasingly understood over time.

      I also wanted to comment that 2D is a beautiful illustration of separate and essentially exclusive communication channels used during HF events in NREM vs REM. They almost perfectly complement each other's frequencies.

      Weaknesses:

      I have only one major scientific critique: I believe we need to see quantification of how phasic REM theta waves with versus without HFOs differ. What do REM HFOs add to the "normal" theta oscillation? Without this comparison, it is more difficult to interpret the meaning of these events. Given that HFO chains have IEIs around the time of a theta cycle duration, are the repeating spiking activities stronger during HFO repeats than during adjacent theta waves without HFOs? What percentage of theta waves contain HFOs, and what is the firing rate during those theta waves with vs without HFOs? Is there differential firing rate modulation? The authors may even consider that all REM-HFO-specific quantifications should be shown as differential from phasic theta cycles without HFOs.

      As a non-scientific comment on the manuscript itself: unfortunately, the paper is difficult to read and understand at times, requiring great effort by the reader. This is to an extent that communication is hindered. The paper is dense with changing methods, often from panel to panel. Unfortunately, the panel quantifications are not explained in the results section in a manner that readers can understand without going to read the methods, often for each individual panel. These measures should be explained in a way that lets readers understand the conclusions of each panel and what gross calculations were used to reach those. Instead, too much jargon is used rather than clear descriptions of the overall calculations being done for each panel. 


      The authors mention in the discussion section that they see increased functional connectivity between mPFC and CA1, but most data suggesting this seems to be based on LFP rather than spiking. Functional connectivity is best defined by spiking-spiking relationships. And these authors have spiking data. So I believe either the descriptive language should be pulled back to something like "oscillatory coupling" or more analyses should be dedicated to showing spike-spike coordination across regions.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigate high-frequency oscillations (HFOs) in the prefrontal cortex during REM sleep. They identify a specific pattern where these HFOs occur in "chains" that are phase-locked to theta oscillations, primarily during the "phasic" periods of REM. The study contrasts these events with isolated HFOs and NREM ripples, suggesting a unique role for these chains in coordinating activity between the prefrontal cortex and the hippocampus. Most notably, the authors report that a specific subset of hippocampal cells-those that co-fire with the prefrontal cortex during these HFOs-increase their firing rates over the course of sleep, suggesting a potential mechanism for selective memory consolidation.

      Strengths:

      The study addresses an under-explored area of sleep physiology: the fine-grained temporal coordination between the cortex and hippocampus during REM sleep. The identification of HFO "chains" and their association with higher theta power provides an interesting framework for understanding how the brain might organize information transfer outside of NREM sleep. The observation that specific hippocampal populations show differential firing rate changes based on their participation in these HFO events is a striking finding that warrants further investigation.

      Weaknesses:

      The primary weakness of the study lies in the lack of a clear distinction between global brain states and the specific events being analyzed. Because the authors compare HFOs across different sleep stages (NREM, tonic REM, and phasic REM) without sufficient controls, it is difficult to determine if the observed differences are intrinsic to the HFOs themselves or simply a reflection of the different physiological states in which they occur.

      Furthermore, the evidence for "structured reactivation" is not yet convincing. The temporal alignment of these reactivation events appears inconsistent, with peaks occurring well before the HFO itself, and the analysis does not sufficiently control for pre-existing cellular assembly strengths. Additionally, some of the sleep architecture presented appears atypical, such as very short REM bouts and direct NREM-to-REM transitions that bypass standard progression, raising questions about the consistency of the sleep detection across animals. Finally, the study does not account for potential confounds like baseline firing rates when interpreting the behavior of "high-cofiring" neurons, which may simply be the most active cells in the population.

    4. Reviewer #3 (Public review):

      Summary:

      Shin et al. examine hippocampal-prefrontal interactions during sleep using simultaneous CA1 and prefrontal cortex recordings in rats performing a spatial memory task. They identify high-frequency oscillation (HFO) events in PFC during REM sleep that occur in theta-modulated chains and are associated with increased CA1-PFC coherence and sequential, sparse reactivation of cortical ensembles. This pattern contrasts with the synchronous reactivation observed during NREM cortical ripples. Together with a simple cholinergic network model, the authors propose that REM HFO chains represent a distinct mechanism for hippocampal-cortical coordination that complements NREM ripple-mediated processing during sleep.

      Strengths:

      A major strength of the work is the extensive electrophysiological dataset, which includes simultaneous recordings of large neuronal populations in both hippocampus and prefrontal cortex across behaviour and subsequent sleep. The analyses linking high-frequency events to population dynamics, interregional coherence, and ensemble reactivation are technically sophisticated and provide an incredibly detailed description of REM-associated cortical activity patterns. In particular, the demonstration that REM HFOs occur in chains aligned to theta phase and organise sequential activation of cortical assemblies represents a potentially important advance in understanding the neural structure of REM sleep activity. The integration of experimental data with a computational model further provides a useful framework for interpreting the observed differences between REM and NREM network states in terms of neuromodulatory influences.

      Weaknesses:

      While overall this study provides a highly valuable body of work, there are two primary limitations, which, if overcome, would provide substantially more significance to the overall characterisation of REM HFOs. Specifically:

      (1) Distinction from wake HFOs

      The results largely support the authors' claim that REM HFO chains represent a distinct pattern of neural coordination compared to NREM cortical ripples. The analyses consistently show differences between REM and NREM events in terms of neuronal modulation, ensemble structure, and interregional coupling. However, similar high-frequency events during wake are not examined. Since REM sleep shares several network features with wakefulness, including strong theta oscillations, evaluating whether comparable PFC HFOs occur during wake would provide clarity on whether these events are specific to REM sleep (and its associated functions) or represent a more general theta-associated phenomenon.

      (2) Link to memory consolidation

      The manuscript proposes throughout that REM HFO chains may contribute to memory consolidation by coordinating hippocampal-cortical reactivation, but the evidence for this functional role remains indirect. The authors do highlight this as a limitation of the study - the inability to link their findings to learning - but it is not clear why. Further details of the behaviour results should be included. If no learning occurred across the eight behavioural sessions, this should be reported. If learning did occur, but could not be linked to HFO events, this should also be reported.

    5. Author Response:

      Reviewer #1 (Public review):

      Summary and Strengths:

      Shin et al deepen our understanding of high-frequency oscillations in the frontal cortex during REM in a manner that sheds important light on the roles of these events. In particular, they reveal that cortical HFOs are modulated by theta oscillations, occur in chains and recruit cortical neuronal activation patterns in a manner that is distinct from other high-frequency events during non-REM or in the hippocampus. They also show that these events occur during increased oscillatory cross-talk between hippocampus and cortex and may protect cortical neurons from downregulation of firing during sleep. Overall, this is important work with several novel observations pointing towards an important role for these events that will become increasingly understood over time.

      I also wanted to comment that 2D is a beautiful illustration of separate and essentially exclusive communication channels used during HF events in NREM vs REM. They almost perfectly complement each other's frequencies.

      We thank the Reviewer for the positive comments and for highlighting the importance of our work, especially the distinct communication patterns during NREM and REM cortical high-frequency events.

      Weaknesses:

      I have only one major scientific critique: I believe we need to see quantification of how phasic REM theta waves with versus without HFOs differ. What do REM HFOs add to the "normal" theta oscillation? Without this comparison, it is more difficult to interpret the meaning of these events. Given that HFO chains have IEIs around the time of a theta cycle duration, are the repeating spiking activities stronger during HFO repeats than during adjacent theta waves without HFOs?

      We agree with the Reviewer that differences in activity during HFOs versus theta in the absence of HFOs is an important comparison to make to determine whether activity during HFOs reflect a unique state of information processing during REM sleep, or is redundant with theta oscillation signatures. We attempt to clarify this point in Figure S4I where we examined PFC population activity during theta periods outside of HFOs. Here, we extracted REM theta periods at least 250 ms away from detected HFOs and split the theta cycles into quartiles based on the theta power at the preferred theta phase bin determined by theta-coupled-HFOs (during that specific sleep session). We expect that using the preferred phase of HFOs is the most accurate choice for this comparison (compared to random phases). Lastly, we aligned PFC population activity to these theta phases and found that even in the highest theta power quartile, theta modulated fluctuations in PFC population activity were absent without HFOs. This indicates that theta-associated HFOs are the primary driver or signature of the observed population activity patterns (Figures 1H, 3F, S4I). An explanation of this procedure can be found in the Methods section under “Control for periods of high theta power”.

      Regarding the comment “what REM HFOs add to the "normal" theta oscillation”, we hypothesize that generation of HFOs and associated population activity is the result of theta-mediated input from other brain regions that converge on PFC. It is possible that CA1 is a candidate region, since we observed that theta frequency activity in CA1 leads PFC (Figure 4K, Phase slope index result). Additionally, the high concentration of acetylcholine and the high inhibitory tone in REM sleep is conducive to local suppression in response to external drive, as shown in the model and noted in the Discussion. Thus, we propose that HFOs delineate transient windows where sparse populations of PFC neurons are activated in the backdrop of overall suppression, potentially to link specific ensembles across PFC and other brain areas such as the hippocampus – a phenomenon that differs from baseline theta activity in REM.

      To address this point, we will provide additional analyses investigating PFC activity profiles during theta periods adjacent to HFOs. We will also reorganize the results and figures to highlight these important control analyses.

      What percentage of theta waves contain HFOs, and what is the firing rate during those theta waves with vs without HFOs? Is there differential firing rate modulation? The authors may even consider that all REM-HFO-specific quantifications should be shown as differential from phasic theta cycles without HFOs.

      To address these points, we will perform the requested analyses and explicitly quantify firing rate differences during HFO and non-HFO theta periods for further clarification.

      As a non-scientific comment on the manuscript itself: unfortunately, the paper is difficult to read and understand at times, requiring great effort by the reader. This is to an extent that communication is hindered. The paper is dense with changing methods, often from panel to panel. Unfortunately, the panel quantifications are not explained in the results section in a manner that readers can understand without going to read the methods, often for each individual panel. These measures should be explained in a way that lets readers understand the conclusions of each panel and what gross calculations were used to reach those. Instead, too much jargon is used rather than clear descriptions of the overall calculations being done for each panel.

      The point is well-taken and we apologize for the dense text and lack of methodological detail in the results section. We agree with the Reviewer that enhancing clarity and adding additional details about the quantitative methods within the main text and figure panels/legends would improve readability and make the manuscript more accessible for a wider audience.

      To address this point, we will include important details in the results section and legends to clarify the methods and calculations used. We will also reorganize the manuscript text and reorder some figure panels for readability, and update the Methods section to parallel the Results/Figure order to the extent possible.

      The authors mention in the discussion section that they see increased functional connectivity between mPFC and CA1, but most data suggesting this seems to be based on LFP rather than spiking. Functional connectivity is best defined by spiking-spiking relationships. And these authors have spiking data. So I believe either the descriptive language should be pulled back to something like "oscillatory coupling" or more analyses should be dedicated to showing spike-spike coordination across regions.

      To address this point, we will temper the claims of functional connectivity and replace all instances with “oscillatory coupling”.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigate high-frequency oscillations (HFOs) in the prefrontal cortex during REM sleep. They identify a specific pattern where these HFOs occur in "chains" that are phase-locked to theta oscillations, primarily during the "phasic" periods of REM. The study contrasts these events with isolated HFOs and NREM ripples, suggesting a unique role for these chains in coordinating activity between the prefrontal cortex and the hippocampus. Most notably, the authors report that a specific subset of hippocampal cells-those that co-fire with the prefrontal cortex during these HFOs-increase their firing rates over the course of sleep, suggesting a potential mechanism for selective memory consolidation.

      Strengths:

      The study addresses an under-explored area of sleep physiology: the fine-grained temporal coordination between the cortex and hippocampus during REM sleep. The identification of HFO "chains" and their association with higher theta power provides an interesting framework for understanding how the brain might organize information transfer outside of NREM sleep. The observation that specific hippocampal populations show differential firing rate changes based on their participation in these HFO events is a striking finding that warrants further investigation.

      We thank the Reviewer for finding our work interesting and for the positive comments regarding our manuscript.

      Weaknesses:

      The primary weakness of the study lies in the lack of a clear distinction between global brain states and the specific events being analyzed. Because the authors compare HFOs across different sleep stages (NREM, tonic REM, and phasic REM) without sufficient controls, it is difficult to determine if the observed differences are intrinsic to the HFOs themselves or simply a reflection of the different physiological states in which they occur.

      We appreciate this concern. We do agree that the generation of these ripples/HFOs in NREM and REM sleep are inextricably linked to global brain state (ex. cholinergic tone, as shown in the model), which results in differing patterns of activity across sleep states. However, we also show that activity associated with ripples and HFOs in NREM and REM sleep, respectively, delineate unique periods that underlie intra- and interregional interactions that differ from activity associated with other phenomena, such as spindles or baseline theta periods, in each respective sleep state. Regarding NREM PFC ripples, in our previous publication (Shin and Jadhav 2024), we show that PFC ripples are strongly associated with spindles and slow oscillations, but when PFC activity was assessed by aligning to each of these events separately, we observed significant differences in activity profiles (Shin and Jadhav 2024), indicating that NREM PFC ripples are indeed periods of differential PFC activity during which local reactivation is particularly strong. Similarly, here, in REM sleep, we see that PFC HFOs are strongly coupled with gamma oscillations and that these two frequency bands separately engage PFC neurons (Figures 2C, S3J, differences in phase locking preference of PFC neurons to gamma and HFO). While we observed strong theta modulated neuronal population activity in response to HFOs (Figure 1H), we did not observe the same for gamma events that were uncoupled from HFOs (Figure S3L, right). However, we did observe the population activity suppression when examining gamma events that were coupled with HFOs, but the theta modulated activity was largely absent (Figure S3L, left), indicating that, in terms of higher frequency oscillations, precise alignment to HFOs drives the theta modulated activity. Furthermore, we provide a control for baseline theta periods outside of HFOs to demonstrate that the phasic, theta-modulated activity (Figures 1H, 3F) is due to association with HFOs, and not a common feature during baseline theta activity (Figure S4I). Together, these results demonstrate that the theta modulated, phasic PFC activity that we report is primarily associated with the presence of HFOs.

      To address this point, we will provide a more detailed explanation for the theta controls that we performed, and conduct additional analyses to control for different baseline periods during REM sleep, similar to the response to Reviewer 1’s first comment.

      Furthermore, the evidence for "structured reactivation" is not yet convincing. The temporal alignment of these reactivation events appears inconsistent, with peaks occurring well before the HFO itself, and the analysis does not sufficiently control for pre-existing cellular assembly strengths.

      We thank the Reviewer for raising these important points. Regarding the temporal alignment of assemblies during REM HFOs, since gamma activity is linked to and precedes HFO activity in REM (Figure S3F,G), we posit that assembly activation preceding HFO alignment may be gamma frequency driven. Indeed, we do observe gamma-associated peaks in PFC population activity temporally adjacent to the start of HFO chains in REM (Figure S5F), which we propose is driving the assembly activation.

      Related to our response to Reviewer 1, the hypothesis that we have regarding this finding is that theta-mediated input to PFC, possibly from several brain areas including the hippocampus, converges and elicits cross-frequency activity spanning gamma and HFO bands. We hypothesize that these gamma and HFO oscillations work in concert to evoke the structured reactivation.

      Furthermore, as the Reviewer accurately points out, we are not able to determine whether the assembly patterns active during the REM HFOs pre-existed prior to their assessment during sleep. Since there was not enough REM sleep during the earlier sleep epochs, we were not able to investigate assembly activation patterns during REM in the first pre-task sleep session prior to W-Track exposure.

      To address these points, we will provide additional support for our claims, add clarification to major points, and expand on the methods used to assess structured reactivation. We will also analyze the spatial rate maps of assemblies during behavior on the W-Track and attempt to link these representations to assembly activity during REM HFOs. If sufficient controls cannot be provided, we will temper the claims of “reactivation” and replace all mentions with assembly “activation”.

      Additionally, some of the sleep architecture presented appears atypical, such as very short REM bouts and direct NREM-to-REM transitions that bypass standard progression, raising questions about the consistency of the sleep detection across animals.

      The reviewer is presumably referring to the hypnograms in Figure S1H. In Figure S1H, we presented concatenated hypnograms across all 9 sleep sessions, regardless of whether they were included for analysis. Furthermore, these hypnograms illustrate the output of just the sleep scoring algorithm and do not take into account the secondary, manual inspection that is performed to confirm sleep epoch inclusion. Individual epoch sleep state plots (e.g. Figure S1B) were visually inspected to confirm robust increases in theta-to-delta ratio detected in the absence of movement – epochs where microarousals or persistent subthreshold fluctuations in animal movement induced noisy TD ratio increases, and thus inaccurate REM designation, were excluded. We also want to note that omitting the edge cases, which is a minor part of the REM sleep data, does not change any results.

      Another consideration is that these animals were running a strenuous learning task that required repeated traversal of multiple maze arms over multiple behavioral session, which likely increased sleep pressure and thus may have altered sleep state dynamics in a subset of animals (Leemburg et al. 2010; Yang et al. 2012).

      To address these points, we will provide updated hypnograms that explicitly highlight the epochs used in analysis to resolve ambiguities. We will also further demonstrate that our procedure for sleep state designation is accurate and consistent across animals with supporting materials, including additional sleep stage classification examples, and REM-specific sleep examples marking tonic and phasic REM.

      Finally, the study does not account for potential confounds like baseline firing rates when interpreting the behavior of "high-cofiring" neurons, which may simply be the most active cells in the population.

      When we compared low and high cofiring neurons in CA1, we did indeed compare baseline firing rates between the two groups and found no differences. We compared both mean firing rates across entire sleep sessions as well as mean firing rates restricted to REM sleep (Figure S7A). We apologize that this important control was not emphasized more clearly.

      To address this point, we will explicitly reference this figure in the main text as a standalone point.

      Reviewer #3 (Public review):

      Summary:

      Shin et al. examine hippocampal-prefrontal interactions during sleep using simultaneous CA1 and prefrontal cortex recordings in rats performing a spatial memory task. They identify high-frequency oscillation (HFO) events in PFC during REM sleep that occur in theta-modulated chains and are associated with increased CA1-PFC coherence and sequential, sparse reactivation of cortical ensembles. This pattern contrasts with the synchronous reactivation observed during NREM cortical ripples. Together with a simple cholinergic network model, the authors propose that REM HFO chains represent a distinct mechanism for hippocampal-cortical coordination that complements NREM ripple-mediated processing during sleep.

      Strengths:

      A major strength of the work is the extensive electrophysiological dataset, which includes simultaneous recordings of large neuronal populations in both hippocampus and prefrontal cortex across behaviour and subsequent sleep. The analyses linking high-frequency events to population dynamics, interregional coherence, and ensemble reactivation are technically sophisticated and provide an incredibly detailed description of REM-associated cortical activity patterns. In particular, the demonstration that REM HFOs occur in chains aligned to theta phase and organise sequential activation of cortical assemblies represents a potentially important advance in understanding the neural structure of REM sleep activity. The integration of experimental data with a computational model further provides a useful framework for interpreting the observed differences between REM and NREM network states in terms of neuromodulatory influences.

      We thank the Reviewer for finding our work important and for the positive comments regarding the manuscript.

      Weaknesses:

      While overall this study provides a highly valuable body of work, there are two primary limitations, which, if overcome, would provide substantially more significance to the overall characterisation of REM HFOs. Specifically:

      (1) Distinction from wake HFOs

      The results largely support the authors' claim that REM HFO chains represent a distinct pattern of neural coordination compared to NREM cortical ripples. The analyses consistently show differences between REM and NREM events in terms of neuronal modulation, ensemble structure, and interregional coupling. However, similar high-frequency events during wake are not examined. Since REM sleep shares several network features with wakefulness, including strong theta oscillations, evaluating whether comparable PFC HFOs occur during wake would provide clarity on whether these events are specific to REM sleep (and its associated functions) or represent a more general theta-associated phenomenon.

      We thank the Reviewer for this suggestion. Indeed, this is an important comparison to make, since electrophysiological patterns of activity are similar across wake and REM sleep states.

      To address this point, we will detect and analyze HFOs during running behavior on the W-Track to determine if they elicit similar, phasic population responses in PFC.

      (2) Link to memory consolidation

      The manuscript proposes throughout that REM HFO chains may contribute to memory consolidation by coordinating hippocampal-cortical reactivation, but the evidence for this functional role remains indirect. The authors do highlight this as a limitation of the study - the inability to link their findings to learning - but it is not clear why. Further details of the behaviour results should be included. If no learning occurred across the eight behavioural sessions, this should be reported. If learning did occur, but could not be linked to HFO events, this should also be reported.

      This point is well-taken and we will reduce emphasis on memory consolidation in the manuscript. We do want to note that the primary focus here was to investigate new cortical-hippocampal activity patterns during sleep states that are established to be important for memory consolidation, in this case, REM sleep. Indeed, several major discoveries of reactivation and cortical-hippocampal physiological patterns in rodent sleep and wake states thought to be important for memory consolidation were initially reported without a link to memory consolidation, e.g., NREM hippocampal reactivation and replay (Wilson and McNaughton 1994; Lee and Wilson 2002), cortical – hippocampal activity coordination in slow-wave sleep (Siapas and Wilson 1998; Ji and Wilson 2007), waking replay in hippocampus (Foster and Wilson 2006; Karlsson and Frank 2009), etc. As Reviewer 1 noted, we expect that an important role for these novel events reported here will become increasingly understood over time.

      The connection between learning and REM HFO activity is a line of investigation that we find very interesting. However, due to the experimental design and the rapid pace at which the animals learn this task (Shin, Tang, and Jadhav 2019), we were not able to robustly relate REM HFO activity to learning. Firstly, with our threshold criteria for REM sleep detection (>10 s) as well as a total REM sleep duration criterion for sessions, most of the sleep epochs included for analysis came from the later sessions when REM sleep was more abundant (Figure SF,G). Consequently, many of the sleep sessions following the earlier behavioral/learning sessions were excluded. Making a claim about the contribution of REM HFOs to the learning process requires the inclusion of REM sleep periods after each behavior session to examine incremental changes in response to learning. Furthermore, a comparison of these REM sleep periods to pre-task REM sleep (pre-task sleep session #1 prior to task exposure) is important to demonstrate that any changes are dependent on experience. However, we were unable to make this comparison due to lack of REM sleep in pre-task sleep session #1. It is likely that an investigation of the role of these novel events in memory consolidation may require rodent task designs that are known to require REM sleep, such as inference tasks (Abdou et al. 2024; Ellenbogen et al. 2007), motor learning (Nitsche et al. 2010), or emotional memory (van der Helm and Walker 2011; Cairney et al. 2015).

      To address this point, we will reinforce this as a limitation of our study, reduce emphasis on memory consolidation, and further clarify that we were not able to link REM HFO activity to learning. We will also include additional details about the behavioral results.

      References

      Abdou, K., M. Nomoto, M. H. Aly, A. Z. Ibrahim, K. Choko, R. Okubo-Suzuki, S. I. Muramatsu, and K. Inokuchi. 2024. 'Prefrontal coding of learned and inferred knowledge during REM and NREM sleep', Nat Commun, 15: 4566.

      Cairney, S. A., S. J. Durrant, R. Power, and P. A. Lewis. 2015. 'Complementary roles of slow-wave sleep and rapid eye movement sleep in emotional memory consolidation', Cereb Cortex, 25: 1565–75.

      Ellenbogen, J. M., P. T. Hu, J. D. Payne, D. Titone, and M. P. Walker. 2007. 'Human relational memory requires time and sleep', Proc Natl Acad Sci U S A, 104: 7723–8.

      Foster, D. J., and M. A. Wilson. 2006. 'Reverse replay of behavioural sequences in hippocampal place cells during the awake state', Nature, 440: 680–3.

      Ji, D., and M. A. Wilson. 2007. 'Coordinated memory replay in the visual cortex and hippocampus during sleep', Nat Neurosci, 10: 100–7.

      Karlsson, M. P., and L. M. Frank. 2009. 'Awake replay of remote experiences in the hippocampus', Nat Neurosci, 12: 913–8.

      Lee, A. K., and M. A. Wilson. 2002. 'Memory of sequential experience in the hippocampus during slow wave sleep', Neuron, 36: 1183–94.

      Leemburg, S., V. V. Vyazovskiy, U. Olcese, C. L. Bassetti, G. Tononi, and C. Cirelli. 2010. 'Sleep homeostasis in the rat is preserved during chronic sleep restriction', Proc Natl Acad Sci U S A, 107: 15939–44.

      Nitsche, M. A., M. Jakoubkova, N. Thirugnanasambandam, L. Schmalfuss, S. Hullemann, K. Sonka, W. Paulus, C. Trenkwalder, and S. Happe. 2010. 'Contribution of the premotor cortex to consolidation of motor sequence learning in humans during sleep', J Neurophysiol, 104: 2603–14.

      Shin, J. D., and S. P. Jadhav. 2024. 'Prefrontal cortical ripples mediate top-down suppression of hippocampal reactivation during sleep memory consolidation', Curr Biol, 34: 2801–11 e9.

      Shin, J. D., W. Tang, and S. P. Jadhav. 2019. 'Dynamics of Awake Hippocampal-Prefrontal Replay for Spatial Learning and Memory-Guided Decision Making', Neuron, 104: 1110–25 e7.

      Siapas, A. G., and M. A. Wilson. 1998. 'Coordinated interactions between hippocampal ripples and cortical spindles during slow-wave sleep', Neuron, 21: 1123–8.

      van der Helm, E., and M. P. Walker. 2011. 'Sleep and Emotional Memory Processing', Sleep Med Clin, 6: 31–43.

      Wilson, M. A., and B. L. McNaughton. 1994. 'Reactivation of hippocampal ensemble memories during sleep', Science, 265: 676–9.

      Yang, S. R., H. Sun, Z. L. Huang, M. H. Yao, and W. M. Qu. 2012. 'Repeated sleep restriction in adolescent rats altered sleep patterns and impaired spatial learning/memory ability', Sleep, 35: 849–59.

    1. eLife Assessment

      This study is useful and unique, since hagfish brains are of phylogenetic importance and can reveal features ancestral to all vertebrates. The manuscript is, however, incomplete and would benefit from contextualization with the current literature; comparisons with the recent amphioxus study are suggested, plus an increased focus on the specific, unique features of the hagfish brain. One significant concern is the apparent absence of Datx2 expression, given that the riboprobe was synthesized from cDNA derived from whole-brain RNA extracts. Ideally, the authors should identify a tissue in which Datx2 is known to be strongly expressed and then apply the probe as a positive control.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a three-dimensional and molecular atlas of the adult hagfish brain to investigate the evolutionary origin and early diversification of vertebrate brain organization. Using whole-brain tissue clearing, light-sheet microscopy, and computational reconstruction, the authors generate a high-resolution 3D anatomical model of the hagfish brain. They complement this structural analysis with gene-expression profiling of neurotransmitter systems and receptors, including glutamatergic, GABAergic, cholinergic, serotonergic, and dopaminergic markers.

      Strengths:

      Together, the work aims to establish a modern neuroanatomical reference for the hagfish. Given the phylogenetic importance of hagfish as one of two extant species of cyclostomes (the other being lamprey), and the fact that the hagfish brain has barely been studied in contrast to the lamprey, the atlas provides a foundational resource and should be of interest to evolutionary and comparative neurobiology.

      Weaknesses:

      However, there are several places where both data presentation and the narrative can be improved and clarified, and particularly some of the homology and evolutionary claims seem to be superlative and need to be toned down. I present more detailed comments below:

      (1) The authors spend too much effort trying to convince readers of the monophyly of hagfish and lamprey to stress its importance for evolutionary comparisons. This is now well accepted; instead, there could be more details on some of the specific, unique features of the hagfish brain relevant to a comparative atlas. For instance, the unusual fusion of the telencephalon anteriorly with the olfactory bulb and posteriorly with the diencephalon (Wicht and Northcutt, 1992), the degenerate visual system, the absence of the pineal gland, and the oculomotor system can be discussed in reference to the generated atlas and examined marker expression in related structures and their possible identity.

      (2) The assertion that the MGE is absent in the lamprey is incorrect based on Sugahara et al. (2016; 2017), who identified lamprey paralogues of Nkx2.1/2.4 that are expressed in the ventral subpallium. This should be corrected.

      (3) The major contribution of this study, in my mind, is the "three-dimensional atlas" of the hagfish brain. However, the atlas itself is not presented; A video of the 3D reconstructed Nissl-stained hagfish brain would be an important data resource and should be added. Annotations of forebrain, midbrain and hindbrain regions and constituent major structures can also be illustrated, which will be a useful resource.

      (4) In the pallium, there seems to be an inner GABAergic cell layer and inner and outer glutamatergic cell layers, as noticed in lampreys (Suryanarayana et al., 2017). What are the overall proportions of glutamatergic and GABA neurons? In the images, it does seem that vGlut neurons are present in both P2 and P4, while there appear to be more GAD neurons in P4.

      (5) As a general comment, homology claims should be toned down throughout the manuscript. This would at least require some connectivity data or transcriptomic analysis for any possible suggestions; the current data, with few markers, are insufficient for any reasonable comparisons.

      (6) Expression of Pax6 and AChE is not sufficient to suggest a cerebellum-like structure. While it is true that embryonic Pax6 expression in the rhombic lip of the hagfish embryo is more comparable to other vertebrates than lamprey, and the presence of a rudimentary cerebellum-like structure would be of great interest, the evidence is too limited for such claims and should be toned down.

      (7) Again, expression of Tbr1 and GAD1 in NCvl neurons does not suggest that these could be hippocampal neurons. One would at least need to rule out expression of prethalamic markers and demonstrate the presence of pallial markers through transcriptomic data (as in Lamanna et al., 2023).

      (8) Presence of GABAergic neurons in the striatum - is there any data on expression of dopamine receptors, particularly given the seeming loss of the D2 receptor subtype in the hagfish?

    3. Reviewer #2 (Public review):

      Summary:

      The work of Harada and collaborators fills an important gap in our knowledge of neuronal identities in the adult hagfish brain. There is essentially no modern, cell-type-level characterisation of neuronal identity in the hagfish brain yet. Existing data are limited to classical neuroanatomy (e.g. Nieuwenhuys) and sparse transmitter/gene-expression studies, mostly in embryos (e.g. work from the Kuratani lab). This study reveals a very broad peculiar pattern of dopaminergic identities and a strikingly unusual pattern of serotonergic transmission, with serotonergic cell bodies present in the telencephalon, which is uncommon for vertebrates and contrasts with previous reports (e.g., Kadota, 1991).

      Strengths:

      The three-dimensional reconstruction of the brain, including the ventricular system, is novel and very useful. Most of the neurotransmitter identity patterns presented here have not been previously described, and those that were published earlier, such as the serotonergic system (e.g. Kadota, Nieuwenhuys, Wicht), are old and would clearly benefit from re-evaluation using more modern approaches.

      Weaknesses:

      Neurotransmitter identities are highly relevant for interpreting the possible presence of LGE/MGE territories in hagfish (e.g. GABAergic patterns), for characterising the raphe nuclei (e.g. serotonergic system), and for refining our understanding of the central prosencephalic complex in relation to other vertebrate brain architectures. However, the authors do not address these points and overlook recent evidence from the amphioxus brain that could help interpret their results in an evolutionary context. Overall, the results are insufficiently discussed in relation to the current state of the art.

      The study would clearly benefit from complementary gene expression profiling to place these neurotransmitter patterns within a broader framework of brain partitions, to enable more direct comparisons with other vertebrates, and, importantly, to interpret them in relation to the prosomeric model. Furthermore, the work lacks appropriate controls for the in situ hybridization experiments; Datx2 does not show any expression, so there is currently no evidence that this probe is functional. Including such controls would also strengthen the overall description of the dopaminergic system, especially given that the expression patterns of the different genes analysed appear very diffuse and somewhat random.

    1. eLife Assessment

      This important work outlines why commonly applied performance metrics in predictive modelling do not accurately reflect translational potential using the example of psychiatric care; it provides a web-based tool to contextualize effect sizes in psychiatry with respect to reliability and base rates, and to calculate the real-world utility of prediction models under different scenarios. The evidence supporting the conclusions is convincing, incorporating established psychometric principles that will be of use for multiple fields, along with transparent quantitative logic and example applications. The manuscript would benefit from further details about how the tool can be optimally applied and how the resulting outputs should be interpreted. The work will be of broad interest to both clinical experts and scientists in biomedicine and the life sciences.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript provides a well‑argued discussion of the misalignment between common predictive performance evaluations reported in the literature and actually measuring clinical utility in the context of predictive psychiatry. Specifically, the authors discuss measurement reliability and prevalence as two neglected factors which can substantially inflate the assessment of model performance for clinical practice. To mitigate this, the authors offer a concrete framework and an accompanying web tool, with which to adjust performance metrics and additional predictive‑value and decision‑analytic measures.

      Strengths:

      The manuscript speaks convincingly about the risk of face validity and the practical irrelevance of seemingly promising predictive models in psychiatry. The authors outline how predictive performance estimations often fail to generalize to clinical contexts and thereby potentially mislead scientific efforts. In the face of ubiquitous biomarker models and incremental improvements in the literature, the reader is reminded that, irrespective of the glory of the proposed model, low reliability of clinical measurements fundamentally affects (and limits) both effect sizes and predictive performance ("garbage in, garbage out"), and that neglecting this can ultimately lead to misinformed decisions in the treatment of individual patients. The provision of an online tool with a user‑friendly interface and clearly worked examples is a major practical asset that will facilitate the adoption of the proposed framework beyond quantitative methodologists.

      Weaknesses:

      While the outlined issues highlight important aspects in the translational gap, the suggested solutions remain somewhat theoretical. For example, the use of prevalence might not reflect what a model would see in practice, assuming that population prevalence and the composition of actual clinical cohorts are aligned. Accounting for who presents to care, and under which referral or triage patterns, is a crucial determinant of effective base rates. While the authors do acknowledge the importance of using base rates from the target population, these nuances could be emphasized more prominently at the points where practical recommendations are made. Relatedly, the analytical context and the methodological assumptions are not clearly specified. Many arguments and demonstrations are derived in univariate, group‑comparison settings and then discussed in a way that can be read as broadly applicable.

    3. Reviewer #2 (Public review):

      Summary and strengths:

      The authors present a description of their online tool to estimate real-world performance of predictive models. The authors bring together different calculations to make better-informed implementation choices. It is a very nice tool to go from effect sizes to base rates to decision curve analysis. The paper describes the background and use of the tool with examples and seems like an extended version of their online how-to. The methods themselves are not new, but I think the tool will be valuable for researchers from different fields. Tools already exist for the conversion of effect sizes (my current favorite is https://www.escal.site/), but I haven't seen measurement noise being incorporated previously. The main benefit is the evaluation of performance under different real-world scenarios. Code is available on GitHub, and the manuscript is well-written.

      Weaknesses:

      While comprehensive explanation and examples are important for correct use of the tool, I don't really see the added value above their online how-to guide, as the software itself has already been published (Karvelis, P. and Diaconescu, A. O. (2025b). E2p simulator: An interactive tool for estimating real world predictive utility of research findings. Journal of Open Source Software, 10(114):8334.)

    4. Reviewer #3 (Public review):

      Summary:

      This important work provides a web-based tool to contextualize effect sizes in psychiatry with respect to reliability and base rates (collectively referred to as predictive utility analysis). The methods for the tool incorporate established psychometric principles that I think are of use for multiple fields in this seemingly easy-to-use tool. I agree with the critical importance of this tool and the methodological points made in this manuscript. Enthusiasm for the manuscript is weakened by a lack of clarity on the formulation of the paper and stated goals of the examples used, with the inferences and impact on clinical decision making from various parameterizations via this tool left open-ended.

      Strengths:

      This paper presents a well-considered and, what I think will be highly useful, web-based tool to contextualize effect sizes with respect to reliability and base rates. As the authors rightly point out, such a tool could be used in conjunction with widespread analytic power analysis tools in study planning. The paper also well contexualizes the need for such a tool in the relatively recent history of concerns of power, reliability, and inference in psychiatry specifically, and more general meta-scientific debates in psychology and neuroscience.

      Weaknesses:

      My primary feedback on this manuscript is the lack of clarity in what the paper itself, specifically, separate from the tool, is hoping to achieve. There is a central, but unresolved, tension in whether the reader is supposed to:

      (1) focus on the specifics of the examples used and whether to reevaluate the substantive claims from the studies, (2) buy in to how various reliability and base rate parameters impact modeling outcomes, (3) receive an introduction to the tool itself.

      In my estimation, the largest contribution to the field here is in (2) and (3), but currently much of the real estate of the paper is dedicated to several examples of (1). While these specific examples may be illustrative to some degree, I think given the number and brevity of such, they are unlikely to incidentally achieve points (2) and (3) above. Specific examples include the assertion of kappas for DSM diagnoses, without much nuance (e.g., see https://psycnet.apa.org/buy/2015-27500-001). Given the relatively limited space given to this example, however, it's hard to be entirely certain what the reviewer should take away.

      A second point of concern is where this tool would be situated in the research pipeline. I agree with the authors that this tool could be used in ways that parallel power analysis. With that in mind, it seems the most common use of this tool for an individual investigator is likely to be in a priori study planning. In contrast, and with my point above in mind, the use of the tool for existing results is likely best done with multiple estimates of effect sizes, reliability, and base rates, as is common in meta-analysis or consensus reviews. Nevertheless, there is no real example or guidance around how this influences new study planning.

      A third point is that more nuance would be useful in the introduction about the current state of psychiatry research. For example, I share many of the authors' concerns about reliability, power, reproducibility, and barriers to translation. That said, it is the case that while effect sizes should be considered considerably more, they are widely considered in psychiatry research via the common place of meta-analysis and other data pooling approaches. Another such example that the authors state in the context of reliability: "However, this [reliability] attenuation is rarely accounted for in routine analyses in psychiatry". This is true in practice, but somewhat misleading insofar as the method by which to do this remains unclear. For example, should we all report disattenuated associations, assuming there is no error and everything is perfectly reliable? This, of course, would be unrealistic to expect zero error. That we can achieve this with the new tool is clear, but the nuance of how and under what circumstances it should be done is not clear, and such nuance should be better reflected in the framing of the problem. That is, there is also a lack of clarity on what ought to be best practices and field-wide goals, rather than simply the lack of an ability to model these factors.

      Minor point

      For conceptual clarity, it would benefit the manuscript to at least briefly mention the role of validity in translational importance. Of course, the current psychometric issues of reliability, base rate, power, etc are critical, but it should at least be mentioned, given the potential wide audience of this manuscript, validity is important as well. For example, highly reliable measures may not be valid indicators of underlying disease etiology (e.g., fMRI head motion is a highly reliable trait-level feature, but typically not considered an important predictor or consequence of mental health worth investing translational resources in). Relatedly, confounding as a general topic would be useful to mention just briefly, to help with the spirit of considering underlying issues in translation.

    1. eLife Assessment

      The authors present useful findings demonstrating that the RNA modification enzyme Mettl5 regulates sleep in Drosophila. Through transcriptome- and proteome-wide analyses, the authors identified downstream targets affected in heterozygous mutants and proposed that Mettl5 regulates the translation and degradation of clock genes to maintain normal sleep function. Through additional analyses, the authors provided solid evidence supporting this model.

    2. Reviewer #1 (Public review):

      Summary:

      Here the authors attempted to test whether the function of Mettl5 in sleep regulation was conserved in Drosophila, and if so, by which molecular mechanisms. To do so they performed sleep analysis, as well as RNA-seq and ribo-seq in order to identify the downstream targets. They found that the loss of one copy of Mettl5 affects sleep, and that its catalytic activity is important for this function. Transcriptional and proteomic analyses show that multiple pathways were altered, including the clock signaling pathway and the proteasome. Based on these changes the authors propose that Mettl5 modulate sleep through regulation of the clock genes, both at the level of their production and degradation, possibly by altering the usage of Aspartate codon.

      Comments on revisions:

      The authors addressed all my comments satisfactorily.

    3. Reviewer #3 (Public review):

      Xiaoyu Wu and colleagues examined a potential role in sleep of a Drosophila ribosomal RNA methyltransferase, mettl5. Based on sleep defects reported in CRISPR generated mutants, the authors performed both RNA-seq and Ribo-seq analyses of head tissue from mutants and compared to control animals collected at the same time point. A major conclusion was that the mutant showed altered expression of circadian clock genes, and that the altered expression of the period gene in particular accounted for the sleep defect reported in the mettl5 mutant. In this revision, the authors have added a more thorough analysis of clock gene expression and show that PER protein levels are increased relative to wild type animals a specific times of day, indicating increased stability of the protein. Given that PER inhibits its own transcription, the per RNA is low in the mutants. The revised manuscript included efforts toward a more detailed understanding of how clock gene expression was altered in the mutants, as well as other clarification of sleep phenotypes.

      Comments on revisions:

      All critiques have been addressed by the authors; the manuscript is much improved from its original submission. Thank you.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Here, the authors attempted to test whether the function of Mettl5 in sleep regulation was conserved in drosophila, and if so, by which molecular mechanisms. To do so they performed sleep analysis, as well as RNA-seq and ribo-seq in order to identify the downstream targets. They found that the loss of one copy of Mettl5 affects sleep, and that its catalytic activity is important for this function. Transcriptional and proteomic analyses show that multiple pathways were altered, including the clock signaling pathway and the proteasome. Based on these changes the authors propose that Mettl5 modulate sleep through regulation of the clock genes, both at the level of their production and degradation, possibly by altering the usage of Aspartate codon.

      Comments on revised version:

      The authors satisfactorily addressed my comments, even though the precise mechanism by which Mettl5 regulates translation of clock genes remains to be firmly demonstrated.

      Reviewer #3 (Public review):

      Xiaoyu Wu and colleagues examined a potential role in sleep of a Drosophila ribosomal RNA methyltransferase, mettl5. Based on sleep defects reported in CRISPR generated mutants, the authors performed both RNA-seq and Ribo-seq analyses of head tissue from mutants and compared to control animals collected at the same time point. A major conclusion was that the mutant showed altered expression of circadian clock genes, and that the altered expression of the period gene in particular accounted for the sleep defect reported in the mettl5 mutant. In this revision, the authors have added a more thorough analysis of clock gene expression and show that PER protein levels are increased relative to wild type animals a specific times of day, indicating increased stability of the protein. Given that PER inhibits its own transcription, the per RNA is low in the mutants. Efforts toward a more detailed understanding of how clock gene expression was altered in the mutants, as well as other clarification of sleep phenotypes throughout is appreciated. As noted above, a strength of this work is its relevance to a human developmental disorder as well as the transcriptomic and ribosomal profiling of the mutant. However, there still remain some minor weaknesses in the manuscript. This reviewer is not in agreement with the interpretation of the epigenetic experiments. Specifically, co-expression of Clk[jrk] or per [01] with the mettl5 mutant recovered the nighttime sleep phenotype, but was additive to the daytime sleep phenotype such that double mutants showed higher sleep. This effect should be acknowledged and discussed. Overall, this is an interesting paper that indicates a molecular link between mettl5 and the circadian clock in regulation of sleep.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors misunderstood my original comment for Fig 1A. Please provide an explanation for the significance of the boxed region. There is little or no detail in the legend to help guide the reader.

      The information has been added to the figure legends for Figure 1A.

      Efforts toward improving analysis of circadian genes as well as sleep phenotypes (sleep onset time, rebound, etc) is much appreciated, thank you. However, Figure S1H and G panel labels are mixed up; please label in the order that they appear and that they correspond to the main text. Why is Figure S1H labeled "ZT 14"?

      Sleep latency is defined as the time from preparing to sleep to actually falling asleep. In this study, it specifically refers to the time taken for each individual fly to reach the sleep phenotype (i.e., 25 minutes of continuous sleep). We noted that this label was misleading, as the actual time to reach the sleep phenotype varied among individual flies. Therefore, in the revised figures, we have removed the ZT14 label. In addition, we have corrected the labeling of Figures S1G and S1H to ensure they appear in the correct order and correspond accurately to the descriptions in the main text.

      Unfortunately, based on Fig S1A-C, I am not convinced that mettl5 localizes to neurons, as there are no cells that show double labelling. This figure does not support the statement: "we found expression in both neurons (colocalizing with ELAV staining: Figure S1A-C) (lines 91-92), and "Mettl5-Gal4 is expressed in distinct neurons and glia that appear crucial for sleep regulation." (line 297). What "distinct" sleep related neurons were labeled? The staining in Fig S1A shows a different distribution from that in Fig S1D, and so it's possible this was a technical issue. Is there a better example?

      Thank you for your careful review and valuable comments. We agree that the colocalization of METTL5 with the neuronal marker ELAV is relatively sparse. However, as indicated by the arrows in Fig S1A–C, we did observe a few cells showing clear double labeling. These examples support the presence of METTL5 expression in neurons, albeit at a low frequency.

      In Figure 4G-H, please indicate the time of day of tissue collection.

      In Figure 4G-H, the tissue was collected at ZT0. We have now indicated this time point in the figure and legend to clarify the experimental timing.

      As noted in the public comment, I remain in disagreement with the assessment that "the double mutant showed the similar phenotype as downstream genes". The striking significant increase in daytime sleep in the double mutants remains unexplained. No further experiments are necessary, but this should be acknowledged in the text. Instead of an epistatic effect, given that overall sleep is high in the double mutants, another possible explanation is that the flies are sick and so are less active and sleeping more.

      Thank you for your suggestion. This has been acknowledged in the text. “Genetic epistasis experiments further supported this model, with clock gene mutants modified Mettl5 mutant phenotypes that suggesting both Clock and  Per downstream of Mettl5 (Figure 4I-N, Table 1). Secondary effect may exist for the significant increase in daytime sleep in the double mutants.”

    1. eLife Assessment

      This study provides valuable insights into the role of MATR3 in oocyte maturation and folliculogenesis, using conditional knockout mice and in vitro follicle culture systems to show that MATR3 is required for oocyte growth and gene transcription, with downstream effects on follicle development. The strength of the evidence is incomplete, as key findings lack independent validation, methodological details are insufficient, and inconsistencies in data presentation reduce confidence in the conclusions. The work will be of interest to researchers in reproductive biology and fertility.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to clarify MATR3's function and molecular mechanism in oocyte growth and maturation, explore its association with OMA, and its potential as a diagnostic and therapeutic target using specific knockout mouse models, human OMA samples, and multi-omics technologies. And it has fully achieved preset objectives with results strongly supporting conclusions. Specifically, it addresses the gap in the synergistic mechanism of epigenetic and secretory signals regulated by RNA-binding proteins (RBPs) in oocyte growth and enriches the molecular etiological spectrum of oocyte maturation disorders. It is the first time the conservative function of MATR3 has been revealed in multiple species, providing a paradigm for cross-species research on RBPs in the field of reproductive biology. It also provides a new candidate target for OMA, a clinically refractory infertility disease, and is expected to promote the optimization of assisted reproductive technology and the development of precision medicine.

      Strengths:

      The strengths of this study are significant and prominent. First, the research system is comprehensive, integrating knockout mouse models, in vitro knockdown models, multi-species (mouse, porcine, and human) verification, combined with scRNA-seq, LACE-seq, CO-IP, and other multi-omics and molecular biology technologies, forming a complete and progressive evidence chain. Second, the mechanism analysis is in-depth, clarifying the dual molecular mechanisms of MATR3 regulating the transcriptional synthesis and secretion of GDF9 through "recruiting KDM3B to regulate H3K9me2 demethylation" and "directly binding to Rdx mRNA", with a clear logical closed loop. Third, the clinical correlation is close. It is the first time to find abnormal nuclear localization of MATR3 in oocytes of OMA patients, providing new clues for clinical disease mechanism research, and verifying the downstream function of GDF9 through rescue experiments, effectively enhancing the translational value of the results.

      Weaknesses:

      This study included only one OMA patient's oocyte sample. Without clinical screening for MATR3 mutations or abnormal expression, establishing a causal relationship between MATR3 and OMA remains difficult.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of MATR3 in oocyte development and folliculogenesis using conditional knockout mouse models together with in vitro follicle culture and molecular analyses. The authors aim to determine whether MATR3 regulates oocyte maturation and follicle development and to explore potential mechanisms linking MATR3 function to transcriptional and epigenetic regulation in growing oocytes.

      Strengths:

      A major strength of the work is the use of a conditional knockout mouse model combined with complementary in vitro follicle culture approaches, which together provide a useful framework for examining gene function during oocyte development. The study also attempts to integrate cellular phenotypes with molecular analyses of transcriptional activity and epigenetic markers.

      Weaknesses:

      Several weaknesses limit the strength of the conclusions. These include insufficient validation of key experimental manipulations (such as the efficiency of MATR3 knockdown in siRNA experiments), limited quantification or statistical analysis for some datasets, inconsistencies between the text and presented data in certain figures, and incomplete methodological descriptions that make it difficult to fully evaluate reproducibility.

    4. Reviewer #3 (Public review):

      Summary:

      The study aims to elucidate the dual molecular mechanisms of the RNA-binding protein MATR3 in oocyte growth and maturation. The authors propose that MATR3, highly expressed in growing oocytes (GOs), regulates oocyte quality through two pathways: epigenetically, by recruiting KDM3B to remove the repressive H3K9me2 mark at the Gdf9 locus to activate transcription; and post-transcriptionally, by binding Rdx mRNA to maintain microvillus structure for GDF9 secretion. This mechanism ensures oocyte-granulosa cell communication and female fertility. The study also explores the link between MATR3 and human oocyte maturation arrest (OMA).

      Strengths:

      The study proposes an innovative dual-mechanism model encompassing "epigenetic transcriptional activation and cytoskeletal regulation," which not only expands the functional understanding of RNA-binding proteins in chromatin regulation but also reveals the coordination between nuclear transcription and organelle structure. By integrating scRNA-seq and LACE-seq, the authors constructed a comprehensive regulatory network for MATR3, identifying both key targets and numerous potential molecules, thereby providing rich resources for future mechanistic studies. Furthermore, the inclusion of oocyte samples from human OMA patients directly links the basic findings to clinical reproductive disorders. Despite the limited sample size, this approach demonstrates strong translational potential.

      Weaknesses:

      The partial phenotypic improvement achieved by exogenous GDF9 supplementation suggests that the downstream effector pathways may involve a more complex network regulation, implying that the current interpretation of GDF9's central role could be further explored. Regarding the developmental abnormalities of granulosa cells in the conditional knockout model, their pathological origins require in-depth analysis to determine whether they represent primary alterations or secondary adaptive responses resulting from the loss of oocyte signaling.

    1. eLife Assessment

      This important study combines cryo-EM, biochemical, and cell-based assays to examine how Gβγ interacts with and potentiates PLCβ3. The authors present evidence for multiple Gβγ interaction surfaces and argue that Gβγ primarily enhances PLCβ3 activity after membrane recruitment rather than serving mainly as a membrane-recruitment factor. The evidence is solid overall, although uncertainty remains about the physiological relevance and precise arrangement of the proposed interfaces because the structural model relies on engineered crosslinking.

    2. Reviewer #1 (Public review):

      The manuscript by Fisher et al describes the molecular mechanism underlying how G beta gamma subunits engage with the beta 3 isoform of PLC. The paper used a combination of cryo EM, BRET assays, and biochemical assays of PLC beta activity. A key discovery is that G beta gamma is not sufficient to drive membrane binding by itself, and instead promotes G alpha activation. The work is important, but suffers slightly from some ambiguity in the actual interface that is present in their cryo EM model, as crosslinkers could stabilise a transient and non-native complex. This is somewhat abrogated by the careful mutational analysis, which shows that mutation of any of these three sites does somewhat block PLC beta G beta gamma activation. However, there could be some improvement in the presentation of this data, as well as possible mutant selection. Overall, this paper is a nice complement to the Falzone et al paper, showing the membrane-bound complex of PLCB3 on membranes, with this work building on this work, highlighting the importance this will have in our full understanding of PLC beta activation.

      Major concerns:

      My biggest concern is the potential that this interface is artefactual based on the crosslinking strategy utilised. Here are thoughts on how this could be better validated, presented in a more convincing way.

      (1) The authors' main claim is that there is a degree of plasticity of G beta gamma binding to the PLC beta 3 isoform, with three possible binding sites. The main complication of this is, of course, the possibility that the crosslinking stabilises a non-native complex, driven by a mutated cysteine.

      Because of this, any other additional details about this interface are going to be critical for the scientific audience to judge if this is accurate.

      What would greatly help Figure 1 is an evolutionary conservation analysis of the novel Gbg interface in PLC, to see how well this is conserved, and compare this to the conservation of the previously annotated sites. Conservation of these sites on both the G beta gamma and PLC side would help justify this as a native complex.

      This will also help orient the reader to the identity of the mutated residues assayed in Figure 3.

      (2) The g beta gamma orientation is also different than what I have observed in previous g beta gamma effector structures. Is there any precedent for this as an effector interface? A supplemental figure comparing this structure to other g beta gamma interfaces from other enzymes, for example recent Tesmer structure with PI3K.

      (3) The mutational analysis in Figure 2D-G seems to give some strange results, and I have some question why certain residues were chosen rather than others. Mutation of the Gbg side will be more complicated, as of course that can affect any of the three surfaces. My main question is that, from the way Figure 2A is oriented, the main salt bridge in their novel interface to me looks like R199-D228, with K183 being in the wrong orientation to E226, and D167 being far from any charged residues. Why did the authors not make the corresponding R199 to D or E mutation?

      (4) To help the reader's interpretation of Figure 2A, I would recommend a supplemental figure showing the density for interfacial residues, as that also would increase confidence in the interface.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors dissect how Gβγ potentiates PLCβ3 signaling in cells. Using engineered crosslinking to stabilize a Gβγ-PLCβ3 complex, single particle cryo-EM, and cell-based functional assays, they identify and map multiple putative Gβγ interaction surfaces on PLCβ3, including a previously unrecognized binding mode. Structure-guided mutagenesis supports the functional relevance of these interactions and suggests that Gβγ potentiation is not primarily mediated by PLCβ3 membrane recruitment, but instead enhances PLCβ3 activity after the lipase is already at the membrane.

      Previous reconstitution work on the membrane surface (Falzone & MacKinnon, 2023) proposed a recruitment/partitioning-centric model in which Gβγ increases PLCβ3 output largely by elevating its membrane surface concentration, whereas Gαq primarily increases catalytic turnover; under those reconstitution conditions, the two inputs can combine approximately multiplicatively. In receptor-driven cellular signaling, however, PLCβ3 is robustly recruited to the plasma membrane upon Gαq activation, which raises the question of whether Gβγ contributes mainly through additional recruitment or through a post-recruitment mechanism once PLCβ3 is already at the membrane.

      This manuscript helps address that gap by using membrane-anchored PLCβ3 and complementary cellular readouts to separate "getting PLCβ3 to the membrane" from "boosting activity once PLCβ3 is already there." Their results argue that, in cells, membrane recruitment is largely dominated by Gαq·GTP, while Gβγ can further potentiate PIP2 hydrolysis after membrane association, consistent with a modulatory role at the membrane rather than primary recruitment.

      Overall, the work provides a structural and mechanistic framework for Gβγ-PLCβ3 cooperation and helps clarify the basis of Gq pathway amplification. The manuscript is generally strong, but some issues need to be addressed.

      Major comments:

      (1) BMOE/BM(PEG)2 crosslinking may enforce a non-native docking geometry, potentially compromising the physiological relevance and precision of the Gβγ-PLCβ3 interface as described. Although a >50% 1:1 crosslinked complex is formed and remains active, the solution maps show lower local resolution for Gβγ, consistent with a dynamic, potentially heterogeneous, interface. One interface is captured via a single engineered cysteine pair (PLCβ3 E60C-Gβ C271), which could potentially bias the pose. It would be helpful if the authors could provide additional orthogonal support (e.g., alternative crosslinked sites) and bolster the clarification of its uniqueness and relevance.

      (2) In the crosslinked structure, the authors report that GβD228 interacts with PLCβ3 R199 and K183. In Figure 2A, R199 appears closer to Gβ D228 than K183, yet only K183 is functionally tested. Testing R199 (e.g., R199E/R199A) would strengthen the structure-guided validation of this interface.

      (3) The mutagenesis strategy appears inconsistent across figures/assays, which makes it difficult to interpret phenotypes and directly link the functional data to the proposed interfaces. For example, in Figure 2E, we see R185L but R215E, while residue L40 is mutated to Gly in the IP accumulation assays but to Glu/Lys (L40E/K) in the BRET assays (Figures 3B/3D/3F). The authors should (i) clearly justify the rationale for each substitution (conservative vs charge-reversal, interface disruption, etc.) and (ii), where possible, test the same mutants across assays (or provide evidence that alternative substitutions yield consistent conclusions).

    4. Reviewer #3 (Public review):

      Summary:

      PLCβ3 is activated by both Gαq and Gβγ subunits. This paper follows previous solutions and cryoEM studies of PLCβ3 / Gβγ, trying to understand the molecular details of activation using cellular BRET assays and cryoEM.

      Strengths:

      The authors find evidence for multiple binding sites on PLCβ3 for Gβγ and suggest that Gβγ is not bone fide activator per se but enhances Gαq activation by positioning the catalytic site towards substrate, although this is not completely convincing. Although these sites may not naturally be operative, the authors might want to develop the potential role of these sites.

      The authors also find that this activation is not through recruitment of the enzyme to the membrane by Gβγ released upon G protein activation, in accord with other PLCβ enzymes, but not for PLCβ3, and again, the authors might want to develop this point further.

      Weaknesses:

      (1) I'm confused as to why the authors feel that their mechanism is distinct from the two-state enzyme, the synergistic activation proposed by Ross in 2011, using a primarily thermodynamic argument. As written, the authors appear to be very reliant on structural and BRET studies that do not give the details that would disprove this interpretation. The main issue is that the author's mechanism does not fully explain how Gβγ activation occurs for PLCβ2 in reconstituted systems in the absence of Gαq subunits.

      (2) In a recent study, McKinnon presents a model showing that Gαq and Gβγ activate PLCβ3 by two distinct pathways and that activation by Gβγ occurs through membrane recruitment. It is not surprising that the authors find that this is not true since the pelleting method used by McKinnon is subject to error. The authors should directly address the limitations of this previous work and the changes in proteoliposomes with sedimentation that alter partition coefficients. Although the inability of Gβγ to drive membrane binding is in accord with the quantitative studies of Scarlata, showing that the affinity of PLCβ3 to Gβγ is fairly weak as compared to the intrinsic membrane partition coefficient.

      (3) It was proposed many years ago that in signaling complexes Gαq - Gβγ may not have to fully dissociate when binding PLCβ, but rather shift their relative orientation when binding to PLCβ to allow activation. Is their model consistent with this? Is it possible that PLCβ3 keeps Gβγ from diffusing to enhance the rate of Gq / Gβγ re-association?

      (4) The authors find that Gβγ binds multiple sites, and it is clear that the PH domain site is the primary one in accord with previous work. Could these weaker sites be an artifact of the elevated concentrations used in cryoEM and BRET assays?

      (5) Although their assays infer differences in binding affinities, it would strengthen the paper if the authors could estimate the association energies of these different binding sites. This estimation would also address the concern stated above.

    5. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Fisher et al describes the molecular mechanism underlying how G beta gamma subunits engage with the beta 3 isoform of PLC. The paper used a combination of cryo EM, BRET assays, and biochemical assays of PLC beta activity. A key discovery is that G beta gamma is not sufficient to drive membrane binding by itself, and instead promotes G alpha activation. The work is important, but suffers slightly from some ambiguity in the actual interface that is present in their cryo EM model, as crosslinkers could stabilise a transient and non-native complex. This is somewhat abrogated by the careful mutational analysis, which shows that mutation of any of these three sites does somewhat block PLC beta G beta gamma activation. However, there could be some improvement in the presentation of this data, as well as possible mutant selection. Overall, this paper is a nice complement to the Falzone et al paper, showing the membrane-bound complex of PLCB3 on membranes, with this work building on this work, highlighting the importance this will have in our full understanding of PLC beta activation.

      Thank you for the positive feedback.

      Major concerns:

      My biggest concern is the potential that this interface is artefactual based on the crosslinking strategy utilised. Here are thoughts on how this could be better validated, presented in a more convincing way.

      (1) The authors' main claim is that there is a degree of plasticity of G beta gamma binding to the PLC beta 3 isoform, with three possible binding sites. The main complication of this is, of course, the possibility that the crosslinking stabilises a non-native complex, driven by a mutated cysteine.

      Because of this, any other additional details about this interface are going to be critical for the scientific audience to judge if this is accurate.

      What would greatly help Figure 1 is an evolutionary conservation analysis of the novel Gbg interface in PLC, to see how well this is conserved, and compare this to the conservation of the previously annotated sites. Conservation of these sites on both the G beta gamma and PLC side would help justify this as a native complex.

      This will also help orient the reader to the identity of the mutated residues assayed in Figure 3.

      We agree that crosslinking can result in the capture a non-physiologically relevant interface. However, we do not observe any crosslinking between Gbg and a PLCb3 variant that retains a cysteine in the disordered region of the X–Y linker, nor crosslinking between PLCb3 and any other cysteine present in the Gbg heterodimer. The evolutionary conservation analysis is a great suggestion and will included in the revision for both Gbg and PLCb.

      (2) The g beta gamma orientation is also different than what I have observed in previous g beta gamma effector structures. Is there any precedent for this as an effector interface? A supplemental figure comparing this structure to other g beta gamma interfaces from other enzymes, for example recent Tesmer structure with PI3K.

      Yes, this is not the more typically observed Gbg–effector interaction, which is mediated by the narrow face of the Gbgtoroid. We are not aware of other structures in which Gbg interacts with a binding partner in the same way. A supplemental figure comparing this Gbg–PLCb interaction to the Gbg–PI3K and Gbg–GRK2 structures will be included in the revision.

      (3) The mutational analysis in Figure 2D-G seems to give some strange results, and I have some question why certain residues were chosen rather than others. Mutation of the Gbg side will be more complicated, as of course that can affect any of the three surfaces. My main question is that, from the way Figure 2A is oriented, the main salt bridge in their novel interface to me looks like R199-D228, with K183 being in the wrong orientation to E226, and D167 being far from any charged residues. Why did the authors not make the corresponding R199 to D or E mutation?

      Thank you for pointing this out. We are in the process of testing the PLCb3 R199E mutant in our assays and will include the results in the revised manuscript.

      (4) To help the reader's interpretation of Figure 2A, I would recommend a supplemental figure showing the density for interfacial residues, as that also would increase confidence in the interface.

      Thank you for the suggestion, this will be included in the revised manuscript.

      Reviewer #2 (Public review):

      In this manuscript, the authors dissect how Gβγ potentiates PLCβ3 signaling in cells. Using engineered crosslinking to stabilize a Gβγ-PLCβ3 complex, single particle cryo-EM, and cell-based functional assays, they identify and map multiple putative Gβγ interaction surfaces on PLCβ3, including a previously unrecognized binding mode. Structure-guided mutagenesis supports the functional relevance of these interactions and suggests that Gβγ potentiation is not primarily mediated by PLCβ3 membrane recruitment, but instead enhances PLCβ3 activity after the lipase is already at the membrane.

      Previous reconstitution work on the membrane surface (Falzone & MacKinnon, 2023) proposed a recruitment/partitioning-centric model in which Gβγ increases PLCβ3 output largely by elevating its membrane surface concentration, whereas Gαq primarily increases catalytic turnover; under those reconstitution conditions, the two inputs can combine approximately multiplicatively. In receptor-driven cellular signaling, however, PLCβ3 is robustly recruited to the plasma membrane upon Gαq activation, which raises the question of whether Gβγ contributes mainly through additional recruitment or through a post-recruitment mechanism once PLCβ3 is already at the membrane.

      This manuscript helps address that gap by using membrane-anchored PLCβ3 and complementary cellular readouts to separate "getting PLCβ3 to the membrane" from "boosting activity once PLCβ3 is already there." Their results argue that, in cells, membrane recruitment is largely dominated by Gαq·GTP, while Gβγ can further potentiate PIP2 hydrolysis after membrane association, consistent with a modulatory role at the membrane rather than primary recruitment.

      Overall, the work provides a structural and mechanistic framework for Gβγ-PLCβ3 cooperation and helps clarify the basis of Gq pathway amplification. The manuscript is generally strong, but some issues need to be addressed.

      Thank you for the positive comments.

      Major comments:

      (1) BMOE/BM(PEG)2 crosslinking may enforce a non-native docking geometry, potentially compromising the physiological relevance and precision of the Gβγ-PLCβ3 interface as described. Although a >50% 1:1 crosslinked complex is formed and remains active, the solution maps show lower local resolution for Gβγ, consistent with a dynamic, potentially heterogeneous, interface. One interface is captured via a single engineered cysteine pair (PLCβ3 E60C-Gβ C271), which could potentially bias the pose. It would be helpful if the authors could provide additional orthogonal support (e.g., alternative crosslinked sites) and bolster the clarification of its uniqueness and relevance.

      We did attempt to isolate other crosslinked complexes. PLCb3-D892 self-crosslinked under all reaction conditions, while PLCb3-D892 XY<sub>Cys</sub> , which retains an endogenous cysteine within the X–Y linker (C516), did not result in any crosslinked product when incubated with Gbg. Only the PLCb3-D892 E60C crosslinked to Gbg as confirmed by SDS-PAGE and SEC. All experiments also used wild-type Gb which contains two solvent-exposed cysteines in the effector binding site (C204 and C271). The greatest number of particles correspond to crosslinking between Gb C271 and E60C in PLCb3-D892. Crosslinking between PLCb3-D892 E60C and other residues in Gbg is possible, but there are not sufficient particle numbers corresponding to these species for 2D classing and reconstruction. These observations, together with the high efficiency of crosslinking, are consistent with a stable and persistent interaction.

      (2) In the crosslinked structure, the authors report that GβD228 interacts with PLCβ3 R199 and K183. In Figure 2A, R199 appears closer to Gβ D228 than K183, yet only K183 is functionally tested. Testing R199 (e.g., R199E/R199A) would strengthen the structure-guided validation of this interface.

      We agree, and functional analysis of PLCb3 R199E will be included in the revision.

      (3) The mutagenesis strategy appears inconsistent across figures/assays, which makes it difficult to interpret phenotypes and directly link the functional data to the proposed interfaces. For example, in Figure 2E, we see R185L but R215E, while residue L40 is mutated to Gly in the IP accumulation assays but to Glu/Lys (L40E/K) in the BRET assays (Figures 3B/3D/3F). The authors should (i) clearly justify the rationale for each substitution (conservative vs charge-reversal, interface disruption, etc.) and (ii), where possible, test the same mutants across assays (or provide evidence that alternative substitutions yield consistent conclusions).

      The mutagenesis experiments were initially carried out independently in the Lambert and Lyon groups. As the study progressed, additional mutations were designed based on prior results. The L40G mutation is one such example. Given its modest impact on activity in the IP accumulation assay, the L40E and L40K mutants designed to maximally disrupt the interface in the BRET experiments. The revision will include the rationale behind different substitutions and discussion of any potential differences.

      Reviewer #3 (Public review):

      Summary:

      PLCβ3 is activated by both Gαq and Gβγ subunits. This paper follows previous solutions and cryoEM studies of PLCβ3 / Gβγ, trying to understand the molecular details of activation using cellular BRET assays and cryoEM.

      Strengths:

      The authors find evidence for multiple binding sites on PLCβ3 for Gβγ and suggest that Gβγ is not bone fide activator per se but enhances Gαq activation by positioning the catalytic site towards substrate, although this is not completely convincing. Although these sites may not naturally be operative, the authors might want to develop the potential role of these sites.

      The authors also find that this activation is not through recruitment of the enzyme to the membrane by Gβγ released upon G protein activation, in accord with other PLCβ enzymes, but not for PLCβ3, and again, the authors might want to develop this point further.

      Thank you for the suggestions.

      Weaknesses:

      (1) I'm confused as to why the authors feel that their mechanism is distinct from the two-state enzyme, the synergistic activation proposed by Ross in 2011, using a primarily thermodynamic argument. As written, the authors appear to be very reliant on structural and BRET studies that do not give the details that would disprove this interpretation. The main issue is that the author's mechanism does not fully explain how Gβγ activation occurs for PLCβ2 in reconstituted systems in the absence of Gαq subunits.

      The reconstitution experiments rely on nM-mM concentrations of purified proteins and liposomes that contain up to 30% PI (4,5)2. PLCb2 and PLCb3 show dose-dependent increases in activity with increasing concentrations of Gbg. PLCb enzymes that interact with the liposomes would encounter liposome-tethered Gbg subunits, which would in turn bind the lipase, tethering to the membrane and helping position the active site for catalysis. While there is not yet experimental evidence that Gbg binding can displace the Ha2’ helix, it could facilitate interfacial activation given the net negative charge of PI (4,5) P2. In addition, PLCb2 is fundamentally different from the other PLCb isoforms in its sensitivity to heterotrimeric G proteins. Given its decreased sensitivity to Ga<sub>q</sub> and increased basal activity, it is possible that autoinhibition by the proximal CTD is weaker. PLCb2 is also abundantly expressed in neutrophils, along with more Gi-coupled receptors. Thus, it is possible that Gbg directly activates PLCb2 in these cells, but future experiments are required to definitively answer this question.

      (2) In a recent study, McKinnon presents a model showing that Gαq and Gβγ activate PLCβ3 by two distinct pathways and that activation by Gβγ occurs through membrane recruitment. It is not surprising that the authors find that this is not true since the pelleting method used by McKinnon is subject to error. The authors should directly address the limitations of this previous work and the changes in proteoliposomes with sedimentation that alter partition coefficients. Although the inability of Gβγ to drive membrane binding is in accord with the quantitative studies of Scarlata, showing that the affinity of PLCβ3 to Gβγ is fairly weak as compared to the intrinsic membrane partition coefficient.

      Thank you for raising this point. The changes in composition, size, and structure when pelleting proteoliposomes may complicate data interpretation and will be discussed in the revision.

      (3) It was proposed many years ago that in signaling complexes Gαq - Gβγ may not have to fully dissociate when binding PLCβ, but rather shift their relative orientation when binding to PLCβ to allow activation. Is their model consistent with this? Is it possible that PLCβ3 keeps Gβγ from diffusing to enhance the rate of Gq / Gβγ re-association?

      The crosslinked complex is compatible with simultaneous binding of a Gbg –Gbg heterotrimer to the PLCb3 without disrupting the observed interface. It is possible that Gbg could interact with Gbg bound to the PH domain or the EF hands in the previously reported reconstruction. If so, the interaction would be mediated by the N-terminal helix of Gbg. Alternatively, the intrinsic GAP activity of PLCb3 may also prevent Gbg from diffusing to promote heterotrimer reassociation.

      (4) The authors find that Gβγ binds multiple sites, and it is clear that the PH domain site is the primary one in accord with previous work. Could these weaker sites be an artifact of the elevated concentrations used in cryoEM and BRET assays?

      Assuming the PH domain is the primary Gbg binding site, it is possible that the secondary EF hand site observed by Falzone and Mackinnon reflects high protein concentrations. However, it seems unlikely that we would reach these concentrations within cells. Our functional data is also consistent with the Gbg binding site in the EF hands playing a functional role in increasing PLCb activity.

      (5) Although their assays infer differences in binding affinities, it would strengthen the paper if the authors could estimate the association energies of these different binding sites. This estimation would also address the concern stated above.

      We appreciate this suggestion and will keep it in mind as we complete the revision.

    1. eLife Assessment

      Kim et al. provide important findings explaining how T3SS assembly is regulated by a conserved genetic context. The evidence supporting the conclusions is compelling, with numerous experiments demonstrating the validity of the findings. The work will be of interest to molecular biologists, biochemists, and microbiologists working on secretion systems or similar complexes. Further studies revealing similar mechanisms in other systems would enhance the impact of the current study.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out to understand the complex regulation of the assembly of the Type 3 Secretion System of S. typhimurium. They found that the gene synteny as well as specific mRNA stem loops were important for the translational coupling of sctS and sctT. Without this regulation, SctT self-oligomerizes, which disrupts the export of effector proteins and leads to a decreased fitness of the pathogen. The work was done using a variety of convincing methods and leads to an updated picture of how T3SS assembly occurs. Since the same genetic synteny is found in a large majority of T3SS in different bacteria, it is likely that this is a general mechanism, but one that needs to be further experimentally validated.

      Strengths:

      The paper uses an impressive amount of experiments, with different techniques, to describe how they identified the genetic regulation of SctT production.

      Weaknesses:

      Only minor weaknesses are found.

      (1) Regarding the use of the complex being unique. It is not well explained what makes this a unique complex.

      (2) The paper would benefit from a discussion regarding how regulation might work in the minority of bacterial strains where the T3SS gene synteny is largely different. One would expect that those bacteria would have a different way of regulating T3SS assembly, but that is not discussed at all by the authors.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Samuel Wagner and colleagues describe an elegant mechanism to prevent promiscuous assembly of a core virulence type III secretion system protein, SctS. Starting from a bioinformatic standpoint, they demonstrate that synteny is highly conserved, and sctT occurs immediately downstream of sctS. Secretion is greatly reduced when sctT is removed or scrambled from its genomic context, and sctT expression is accordingly reduced (sctS synteny is also important, though less so). The distance between sctS and sctT is crucial. An elegant series of genetic experiments leads the authors to pinpoint a stem loop structure that occludes the Shine-Dalgarno sequence of sctT. This property is independent of the actual gene preceding sctT. In sum, this means that SctS is already expressed before SctT is expressed, preventing SctT from forming cytotoxic homooligomers.

      Strengths:

      The manuscript is very well-written, easy to follow, and describes a substantial amount of genetic detective work to identify the underlying mechanism. I have only a number of textual suggestions, mainly for the Introduction text, which I believe could be revised for a flagellar and broader audience.

      Weaknesses:

      Major concern:

      While the work is rigorous and substantial, I am unsure as to whether its findings will appeal beyond a niche audience.

      Minor points:

      (1) Line 117: The number here seems to be very small. RefSeq has ~200,000 genomes. My guess is that at least 100,000 of these will be bacterial. Many (most?) bacteria have flagella, and some unflagellated strains have injectisomes, meaning I would have guessed that the authors would have ~50,000 genomes with SctRSTU. This estimate is error-prone, but not by too much. Can the authors explain the discrepancy between my estimate and their figure of almost two orders of magnitude? (SctRSTU/FliPQGFlhB should also be easy to pick up by sequence searches, so I don't think this is due to false negatives).

      (2) Discussion: I would appreciate some discussion of how species that do not conserve the synteny of sctS and sctT prevent problems of sctT oligomerisation? It doesn't need to be evidence-based at this stage, but I'm sure the authors have thought about this, and the Discussion is an appropriate place to share their speculations.

    4. Reviewer #3 (Public review):

      At the core of the bacterial type III secretion system (T3SS), a nanomachine used to inject effector proteins into eukaryotic cells, five highly conserved proteins, SctRSTUV, form the export apparatus, which is the actual gate for effector proteins. Not only are these proteins the most strongly conserved parts of the system, but also their gene order is conserved, which is not the case for most other components of the T3SS. Interestingly, this order does not completely recapitulate the assembly order, which is SctR5-T4-S-U-V. Looking into the reasons for the conserved synteny, the authors noted a stem-loop in the mRNA of the Salmonella SPI-1 sctS gene, which is present in many other T3SS as well (and in fact had been found in Yersinia before). They then use an array of clever gene permutations and modifications to discern the benefit of this order for the bacteria. The combination of thorough sequence analysis with different, partly quantitative, protein expression and secretion assays and growth curves, both in the native Salmonella background and in heterologous systems, provides strong evidence for the interpretation of the authors: The stem-loop in sctS prevents the premature expression of SctT, which can otherwise assemble into "futile multimers" that can lead to ion loss. The presence of stem-loops in many other sctS/T genes gives weight to this finding.

      This is a very nice and thorough study addressing an important point in the assembly of type III secretion systems. I only have a few suggestions.

      (1) Conserved gene orders have been shown for many complexes, and the findings presented in this manuscript might be applicable to other membrane complexes.

      The conservation of gene order and the presence of the stem loop give weight to the authors' findings. However, it is only mentioned quite late in the discussion that a similar stem loop was found in Yersinia upstream sctT earlier, and was interpreted differently. The authors' current discussion is somewhat evasive on this point. Why would these similar structures be used differently? Why would temperature not play a role in Salmonella SPI-1? And wouldn't the stem-loop also couple sctS and sctT expression in Yersinia? This should be addressed, if possible, by experiments (at least, the influence of temperature on the SPI-1 mRNA structure should be testable for the authors) and by a more detailed discussion (given the redundancy of RNA thermometers in the Yersinia T3SS, the interpretation in the current paper might well be the more compelling one).

      (2) A point that deserves more attention is that a similar finding in Yersinia has been interpreted differently before (as a temperature sensor rather than translational coupling) - are these systems really different? Testing the different interpretations in the respective other system (at least the influence of temperature in the Salmonella SPI-1 system used in this manuscript) would have made the interpretation even more compelling.

      (3) Another point that should be discussed in more detail is why this mechanism is present when replacement of the sctT ATG by weaker start codons and the simple omission of a separate SD sequence upstream sctT would achieve the same outcome. This could be tested in one of the nice heterologous systems, as used in Figure 4.

    1. eLife Assessment

      This valuable study presents a comparative dataset on crab locomotion to examine the evolution of sideways walking. The evidence supporting the authors' claims is largely convincing. This work will be of interest to researchers in animal locomotion.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting and well-written manuscript in which the authors set out to answer a simple, old question with a modern toolkit: where in crab evolution did sideways walking arise, how often has it been lost or regained, and is it plausibly linked to the ecological and taxonomic success of true crabs. To do this, they record locomotion from 50 live species, convert each species' movements into a quantitative index that compares forward versus sideways bouts, and then map the resulting states onto a recent crab phylogeny to infer the most likely evolutionary history of locomotor direction.

      Strengths:

      The strongest part of the study is the dataset itself. Comparable behavioral measurements across dozens of crab species are rare. The authors have done the field and husbandry work needed to make this possible. The overall pattern they recover, that most true crabs are strongly biased toward sideways movement (while a smaller set of lineages move predominantly forward), is interesting and likely to be useful to others. The phylogenetic mapping is also a reasonable way to address the "how many times" question (although this is peripheral to my expertise). The manuscript makes a convincing case that sideways locomotion is not simply a trivial byproduct of a crab-like body plan.

      Weaknesses:

      Where I am less convinced is in how strongly the authors describe the discreteness of the behavioral categories and the absence of intermediates. The manuscript states that the Forward-Sideways Index shows a clear separation between two locomotor types with little evidence for intermediates, and it cites a statistical test rejecting a single peak in the distribution. However, the histogram in Figure 3 appears structured within each labeled category, with subclusters inside both the forward and sideways groups rather than a single tight peak per group. This matters because the index is built by first placing each movement bout into "forward" versus "sideways" bins using a fixed angle boundary and then collapsing the result into a single ratio. That approach is simple and transparent enough, but it can also hide mixed strategies. For example, a species that produces substantial amounts of both forward and sideways walking can still end up with a strongly positive or negative index, and therefore be classified as a pure "type," even though the underlying behavior is mixed. In that context, rejecting a single peak in the across-species distribution does not, by itself, justify the stronger claim that intermediates are rare or absent.

      Related to this, a key methodological choice is the use of 60 degrees as the cutoff between forward and sideways bouts. This boundary may be reasonable as a convention, but the paper does not explain why it is the right place to draw the line, and there is a plausible biological concern that a fixed angular cutoff does not mean the same thing across taxa.

      Crabs vary in body shape and in how the legs are arranged around the body. In my own comparative work, for example, some species show an elliptical stance pattern elongated along the preferred direction of travel, while others show a more circular leg arrangement, and the latter can express more mixed forward and sideways behavior. When limb arrangement and body geometry differ across species, the same measured angle can correspond to different underlying mechanics and different functional "degree of sidewaysness." The practical implication is that the reported binary separation may partly reflect the imposed classification rule, rather than a sharp biological divide.

      Another limitation that affects interpretation is the decision to use one individual per species. I understand the logistics, and for some questions, a single representative individual can be a reasonable first pass. But it is not strong support for negative claims about intermediates, especially in a group where individuals can change substantially with growth and allometry. Crabs can grow dramatically, often with pronounced allometric shifts in limb proportions that can alter the center of mass location. Size alone can alter the kinematics and choice of locomotor behaviors in crustaceans. In species where appendage proportions change with size, or where certain legs become disproportionately large (or calcified), it is plausible that locomotor direction and the distribution of movement angles shift across ontogeny. That makes it hard to treat a single individual as a complete description of a species-level strategy, particularly for species that fall closer to the boundary between categories.

      In sum, this is a valuable and useful behavioral comparative study with a dataset that many in the field will appreciate. The main conclusions about the likely evolutionary placement of sideways walking are plausible, but several of the stronger claims about discrete locomotor types, the absence of intermediates, and the relationship to diversification would be more convincing if the analysis were less dependent on a fixed angular cutoff and on single individuals per species, or if the manuscript framed those points more cautiously so the conclusions track the strength of the evidence.

    3. Reviewer #2 (Public review):

      Summary:

      The current work investigates the evolution of sideward locomotion in Brachyura in light of a single evolutionary origin. To this end, the authors first analysed the mode of locomotion in 50 crab species and observed mutually exclusive presence of sideways vs. forward movement. The phylogenetic analysis confirmed that there is indeed a single evolutionary origin for sideways movement, which was sometimes followed by several reversions to forward locomotion. This way, authors demonstrate how locomotor movement modes shape evolutionary diversification in animals by showing that species richness is much higher in side-ways-moving crabs than in the nearest groups. This is an interesting work that integrates behavioural analysis and phylogenetic relations, capitalising largely on crabs. I have a few suggestions and questions.

      Firstly, I think the paper spends too much time on a straightforward analysis of the mode of locomotion. I was also wondering whether the phylogenetic analysis could be simply achieved by maximising an objective function in which the modes of movement are inversely coded for two putative groups, with all values calculated at all possible nodes.

      Unfortunately, I find that the authors did not sufficiently discuss differences in the ecological niches of species with forward vs. sideways locomotion modes (including challenges of locomotion and substrate).

      Likewise, what are the anatomic correlates of forward vs. sideways locomotion? For instance, how are the advantages assumed for sideways movement associated with a flattened body? Is it possible that the mode of motion is secondary to flattened/narrow body structure, which basically limits the distance between legs and thus makes the forward movement difficult - under this logic, the mode of movement would be a secondary phenomenon to body shape traits. How can one differentiate between this alternative and the one that puts the mode of movement in the centre of the story? On a related note, how do different modes of movement relate to the ability to fit into tight spaces - how does it relate to differences in leg joints?

      Is it possible that the sideways movement maximises the scanned visual field per unit time/displacement, which may be beneficial for mostly forward-moving predators?

      It is really difficult to decipher the information contained in the nodes (circles) in the printed black-and-white version of the manuscript.

      Briefly, although I find the study interesting, the presented complexity may not be necessary given the endpoints; it can be achieved much more simply. Furthermore, the degree to which the conceptual analysis of different modes of locomotion was exercised was limited. The general approach may serve as a good model for the evolutionary analysis of other traits. The demonstration of traceability of the relations in question is a major contribution of the work.

      Strengths:

      The research question and the novel combination of different data types.

      Weaknesses:

      The complexity of the methods used, along with a limited discussion of the potential dynamics that may underlie the evolution of the sideways movement mode.

    1. eLife Assessment

      This important study shows that orientation tuning of V1 neurons is suppressed during a continuous flash suppression paradigm, especially in neurons with binocular receptive fields. These findings, made using cutting-edge imaging techniques, convincingly implicate early visual processing in continuous flash suppression, in agreement with previous studies suggesting reduced effective contrast of such stimuli in V1.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have submitted a second revision, largely to address a comment from Reviewer 2, which was "The failure to model the neural data with an explicit model is a missed opportunity." The authors have now included a computational model.]

      This study makes a fundamental contribution to our understanding of interocular suppression, particularly continuous flash suppression (CFS). Using neuroimaging data from two macaque monkeys, the study provides compelling evidence that CFS suppresses orientation responses in neurons within V1. These findings enrich the CFS literature by demonstrating that neural activity under CFS may prevent high-level visual and cognitive processing.

      Comments on previous revisions:

      The authors have addressed all my previous comments.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this study was to investigate the degree to which low-level stimulus features (i.e., grating orientation) are processed in V1 when stimuli are not consciously perceived under conditions of continuous flash suppression (CFS). The authors measured the activity of a population of V1 neurons at single neuron resolution in awake fixating monkeys while they viewed dichoptic stimuli that consisted of an oriented grating presented to one eye and a noise stimulus to the other eye. Under such conditions, the mask stimulus can prevent conscious perception of the grating stimulus. By measuring the activity of neurons (with Ca2+ imaging) that preferred one or the other eye, the authors tested the degree of orientation processing that occurs during CFS.

      Strengths:

      The greatest strength of this study is the spatial resolution of the measurement and the ability to quantify stimulus representations during CSF in populations of neurons preferring the eye stimulated by either the grating or the mask. There have been a number of prominent fMRI studies of CFS, but all of them have had the limitation of pooling responses across neurons preferring either eye, effectively measuring the summed response across ocular dominance columns. The ability to isolate separate populations offers an exciting opportunity to study the precise neural mechanisms that give rise to CFS, and potentially provide insights into nonconscious stimulus processing.

      Weaknesses:

      (The authors have now included a computational model in the second revision.)

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Tang, Yu & colleagues investigate the impact of continuous flash suppression (CFS) on the responses of V1 neurons using 2-photon calcium imaging. The report that CFS substantially suppressed V1 orientation responses. This suppression happens in a graded fashion depending on the binocular preference of the neuron: neurons preferring the eye that was presented with the marker stimuli were most suppressed, while the neurons preferring the eye to which the grating stimuli were presented were least suppressed. Binocular neuron exhibited an intermediate level of suppression.

      Strengths:

      The imaging techniques are cutting-edge.

      Weaknesses:

      The strength of CFS suppression varies across animals, but the authors attribute this to comparable heterogeneity in the human psychophysics literature.

      Comments on previous revisions:

      The authors have addressed my comments from the previous round of review, and I have no further comments.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study makes a fundamental contribution to our understanding of interocular suppression, particularly continuous flash suppression (CFS). Using neuroimaging data from two macaque monkeys, the study provides compelling evidence that CFS suppresses orientation responses in neurons within V1. These findings enrich the CFS literature by demonstrating that neural activity under CFS may prevent high-level visual and cognitive processing.

      Comments on revisions:

      The authors have addressed all my previous comments.

      Thanks for the very warm comments!

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to investigate the degree to which low-level stimulus features (i.e., grating orientation) are processed in V1 when stimuli are not consciously perceived under conditions of continuous flash suppression (CFS). The authors measured the activity of a population of V1 neurons at single neuron resolution in awake fixating monkeys while they viewed dichoptic stimuli that consisted of an oriented grating presented to one eye and a noise stimulus to the other eye. Under such conditions, the mask stimulus can prevent conscious perception of the grating stimulus. By measuring the activity of neurons (with Ca2+ imaging) that preferred one or the other eye, the authors tested the degree of orientation processing that occurs during CFS.

      Strengths:

      The greatest strength of this study is the spatial resolution of the measurement and the ability to quantify stimulus representations during CSF in populations of neurons preferring the eye stimulated by either the grating or the mask. There have been a number of prominent fMRI studies of CFS, but all of them have had the limitation of pooling responses across neurons preferring either eye, effectively measuring the summed response across ocular dominance columns. The ability to isolate separate populations offers an exciting opportunity to study the precise neural mechanisms that give rise to CFS, and potentially provide insights into nonconscious stimulus processing.

      Weaknesses:

      However, while this is an impressive experimental setup, the major weakness of this study is that the experiments don't advance any theoretical account of why CFS occurs or what CFS implies for conscious visual perception. There are two broad camps of thinking with regard to CFS. On the one hand, Watanabe et al., 2011 reported that V1 activity remained intact during

      CFS, implying that CFS interrupts stimulus processing downstream of V1. On the other hand, Yuval-Greenberg and Heeger (2013) showed that V1 activity is in fact reduced during CFS. By using a parametric experimental design, they measured the impact of the mask on the stimulus response as a function of contrast, and concluded that the mask reduces the gain of neural responses to the grating stimulus. They presented a theoretical model in which the mask effectively reduced the SNR of the grating, making it invisible in the same way that reducing contrast makes a stimulus invisible.

      In the first submission of the manuscript, the authors incorrectly described the Yuval-Greenberg & Heeger (2013) paper and Watanabe et al. (2011) papers, suggesting that they had observed the same or similar effects of CFS on V1 activity, when in fact they had described opposite results. Reviewer 1 also observed that the authors appeared to be confused in their reading of these highly relevant papers. In the revision, the authors have reworked this paragraph, now correctly describing these sets of opposing results. However, I still do not understand what the authors are trying to argue: "...these studies were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses." I do not understand what is meant by "pure" in this case.

      This is clarified as: “Nevertheless, these studies contrasted monocular and dichoptic masking conditions to equate stimulus input while manipulating perceptual visibility, which were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses, that is, the difference of BOLD signals between binocular masking and stimulus alone conditions.” (line 63)

      Regardless, it is clear that the measurements in the present study strongly support the interpretation of Yuval-Greenberg & Heeger (i.e., that V1 activity is degraded by CFS, 'akin' to a loss in the contrast-to-noise ratio of neural activity). It would be appropriate for the authors to communicate this clearly.

      We agree and added the following sentence in the text: “These results support the conclusion of Yuval-Greenberg and Heeger (2013) that V1 activity is degraded by CFS, ‘akin’ to a loss in the contrast-to-noise ratio of neural activity” (line 122)

      I continue to be of the opinion that this study is lacking an adequate model of interocular interactions that might explain the Ca2+ imaging. The machine learning results are not terribly surprising - multivariate methods, such as SVMs, are more sensitive than univariate approaches. So it is plausible that an SVM can support decoding of the coarse orientation information, even when no tuning is evident in the univariate analyses. However, the link between this result and the underlying neurophysiology is opaque. The failure to model the neural data with an explicit model is a missed opportunity.

      We agree and put “An ocular-dominance-dependent gain control model” back to the text. Fig. 2D now shows the results of model fitting.

      (line 167)

      An ocular-dominance-dependent gain control model

      We developed an ocular dominance-dependent gain control model to account for the impact of CFS on V1 population orientation tuning. The model development followed two steps.

      Step I. Population orientation tuning functions before CFS

      The population orientation tuning functions due to monocular stimulation exhibited different amplitudes among OD groups (Fig. 2D, red curves), which could be simulated with Equation 1, an OD-weighted Gaussian basis function:

      where parameters A, σ, and B corresponded to the amplitude, standard deviation, and minimal response of the Gaussian basis function, respectively, and θ represented the preferred orientation of a bin of neurons relative to the actual orientation of the grating stimulus. The weight parameter w was the mean of linearly transformed ODIs of neurons in a neuronal group, which equated to (ODI +1)/2 or 1 - (ODI + 1)/2, depending on contralateral or ipsilateral eye grating stimulation, and ranged from 0-1. Thus, a smaller w would indicate a higher preference for the eye seeing the grating, and a larger w would indicate a higher preference for the unstimulated eye (or the eye seeing the flashing masker under CFS). The w equated to 0.33, 0.50, and 0.67 in Monkey A, and 0.32, 0.5, and 0.68 in Monkey B, for the grating eye-preferring group, binocular group, and the masker eye-preferring group, respectively. The exponent s represented a nonlinear transformation.

      Equation 1 fitted the baseline data well (Fig. 2D, red curves), resulting in goodness-of-fit (R<sup>2</sup>) values at 0.94 and 0.95 for the two monkeys, respectively. This indicated that the equation captured the OD-dependent population orientation tuning characteristics of V1 neurons with monocular stimulation before CFS.

      Step II. The impacts of CFS

      In step II, the model introduced several binocular combination factors to account for population orientation tuning functions under CFS.

      To account for the OD-dependent changes of orientation tuning bandwidths under CFS, a w-dependent inhibition factor wt was introduced, which scaled the σ of the tuning functions, changing the monocular tunings R into R’:

      This allowed different groups of neurons to exhibit various degrees of orientation tuning function broadening, capturing the pattern in which neurons preferring the eye seeing the grating displayed a sharper population orientation tuning curve under CFS than those preferring the eye seeing the masker.

      Previous studies have shown that binocular neuronal responses can be modeled by incorporating interocular suppression and summation processes (Kato et al., 1981; Dougherty, Cox, Westerberg, & Maier, 2019; Zhang et al., 2024). Therefore, R’ was further normalized by the neural response to the flashing masker to simulate interocular suppression, which was the first component of Equation 3. Additionally, the neural response to the flashing masker was summed to simulate binocular summation, which was the second component of Equation 3. These two components when summed, determining the final neural responses under CFS:

      where N was the empirical neural response to the monocularly presented flashing masker stimulation, a and b were scaling parameters, and k and m were nonlinearity parameters. The interocular normalization by masker response led to amplitude reduction of population orientation tuning functions for different groups of neurons, while the binocular summation with masker response elevated the minimal responses of tuning functions to their corresponding heights.

      During the step II model fitting, the parameters A, σ, and s were inherited from the monocular tuning fits derived from Equation 1 and served as inputs, while the parameters a, k, b, m, and t were optimized. The model captured the CFS modulation on population orientation tuning curves well, with R2 = 0.99 and 0.98 for Monkeys A and B, respectively (Fig. 2D, red curves).

      Reviewer #3 (Public review):

      Summary:

      In this study, Tang, Yu & colleagues investigate the impact of continuous flash suppression (CFS) on the responses of V1 neurons using 2-photon calcium imaging. The report that CFS substantially suppressed V1 orientation responses. This suppression happens in a graded fashion depending on the binocular preference of the neuron: neurons preferring the eye that was presented with the marker stimuli were most suppressed, while the neurons preferring the eye to which the grating stimuli were presented were least suppressed. Binocular neuron exhibited an intermediate level of suppression.

      Strengths:

      The imaging techniques are cutting-edge.

      Weaknesses:

      The strength of CFS suppression varies across animals, but the authors attribute this to comparable heterogeneity in the human psychophysics literature.

      Comments on revisions:

      The authors have addressed my comments from the previous round of review, and I have no further comments

      Thanks!

    1. eLife Assessment

      The authors adapt sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to map DNA replication origins in Trypanosoma brucei. The main impact of this work is reporting a new set of putative origins, which do not overlap with previously reported origins, but which appear to overlap with previously mapped DNA-RNA hybrid (R-loops). Thus, these valuable findings open up new avenues for further investigation into the mechanistic basis for firing of replication forks in this organism. However, the supporting evidence remains incomplete and would benefit from orthogonal validation. This work will be of interest to those studying DNA replication and epigenetic regulation of fork origins.

    2. Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (i) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (ii) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping, across the whole genome, to ensure full understanding and clarity.

      In the revised manuscript, the authors have improved the presentation and analysis of their data, expanding the description of SNS-seq mapping across the genome, and more clearly assessing to what extent there is correlation between SNS-seq signal and previous mapping approaches to predict origins (by MFA-seq and ChiP-chip of ORC1/CDC6). With regard the correlation between SNS-seq and ORC/1CDC6 ChIP-chip, it should be noted that two datasets were generated in distinct strains of T. brucei (Lister 427 and TREU927, respectively), and it is unclear if the latter dataset can be accurately mapped to the strain used here. Notwithstanding this concern, these improvements clarify a number of aspects of the SNS-seq mapping: (1) the signal is more prevalent in the transcribed core of the genome than in the largely transcriptionally silent subtelomeres; and (2) whereas previous work revealed strong correlation between ORC1/CDC6 localisation and MFA-seq peaks at the ends of multigene transcription units, neither of these data show significant overlap with SNS-seq signal, which is not seen at transcription start or stop sites ('SSRs'; supplementary Fig.8D) and shows marked depletion at predicted ORC1/CDC6 sites (supplementary Fig.8C). To the authors' credit, they acknowledge this lack of correlation in the discussion.

      The authors have not provided any new data to substantiate their assertion that SNS-seq accurately detects origins in T. brucei, and therefore the work rests on a single experimental approach, without validation. As a result, the suggestion of abundant, previously undetected origins in the intergenic regions of multigene transcription remains a prediction. One key untested limitation of the work lies in the observation that the very large majority of SNS-seq signal overlaps with previously RNA-DNA hybrids; without an experimental test, the suggestion that the authors have 'disclosed for the first time a strong link between RNA:DNA hybrid formation and DNA replication initiation' remains conjecture.

    3. Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of origins of replications. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Between the initial submission and this revision, the raised major concerns have not been resolved, and no additional validation has been provided.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript is concluded with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) There are substantial discrepancies between the origins identified here and those reported in previous studies. Given that the other studies precede this manuscript, it is the authors' duty to investigate these differences. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      (2) I am concerned that up to 96% percent of all SNS-seq peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Upon request, the authors have performed a control, where randomly placed peaks were run through the same filtering process. Only approximately twice as many experimental peaks passed filtering compared to random peaks. While the authors emphasize reproducibility between replicates, technical artifacts from the protocol would also be reproducible. Moreover, in other SNS-seq studies, for example, Pratto et al. Cell 2021, Fig. 1B, + and − strand peaks always appear closely paired. This pattern contrasts strongly with Fig. 2A in this manuscript.

      Further, I have some minor concerns that do not affect the main conclusions of the manuscript:

      - Fig 2C: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions. This data would be better presented with all regions stretched to the same size. This has not been addressed in the revision.

      - Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length. This has not been addressed in the revision.

      Are claims well substantiated?:<br /> The identification of origins via SNS-seq appears to be incompletely supported to me.<br /> All downstream analyses depend on the reliability of origin identification.

      Impact:<br /> This study has the potential to be valuable for two fields: In research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Further, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eucaryotic model organism.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (i) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (ii) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping, across the whole genome, to ensure full understanding and clarity.

      In the revised manuscript, the authors have improved the presentation and analysis of their data, expanding the description of SNS-seq mapping across the genome, and more clearly assessing to what extent there is correlation between SNS-seq signal and previous mapping approaches to predict origins (by MFA-seq and ChiP-chip of ORC1/CDC6). With regard the correlation between SNS-seq and ORC/1CDC6 ChIP-chip, it should be noted that two datasets were generated in distinct strains of T. brucei (Lister 427 and TREU927, respectively), and it is unclear if the latter dataset can be accurately mapped to the strain used here. Notwithstanding this concern, these improvements clarify a number of aspects of the SNS-seq mapping: (1) the signal is more prevalent in the transcribed core of the genome than in the largely transcriptionally silent subtelomeres; and (2) whereas previous work revealed strong correlation between ORC1/CDC6 localisation and MFA-seq peaks at the ends of multigene transcription units, neither of these data show significant overlap with SNS-seq signal, which is not seen at transcription start or stop sites ('SSRs'; supplementary Fig.8D) and shows marked depletion at predicted ORC1/CDC6 sites (supplementary Fig.8C). To the authors' credit, they acknowledge this lack of correlation in the discussion.

      The authors have not provided any new data to substantiate their assertion that SNS-seq accurately detects origins in T. brucei, and therefore the work rests on a single experimental approach, without validation. As a result, the suggestion of abundant, previously undetected origins in the intergenic regions of multigene transcription remains a prediction. One key untested limitation of the work lies in the observation that the very large majority of SNS-seq signal overlaps with previously RNA-DNA hybrids; without an experimental test, the suggestion that the authors have 'disclosed for the first time a strong link between RNANA hybrid formation and DNA replication initiation' remains conjecture.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of origins of replications. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Between the initial submission and this revision, the raised major concerns have not been resolved, and no additional validation has been provided.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript is concluded with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) There are substantial discrepancies between the origins identified here and those reported in previous studies. Given that the other studies precede this manuscript, it is the authors' duty to investigate these differences. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      We agree that orthogonally validation of origins detected by stranded SNS-seq is necessary and we are working on it.

      (2) I am concerned that up to 96% percent of all SNS-seq peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Upon request, the authors have performed a control, where randomly placed peaks were run through the same filtering process. Only approximately twice as many experimental peaks passed filtering compared to random peaks. While the authors emphasize reproducibility between replicates, technical artifacts from the protocol would also be reproducible. Moreover, in other SNS-seq studies, for example, Pratto et al. Cell 2021, Fig. 1B, + and − strand peaks always appear closely paired. This pattern contrasts strongly with Fig. 2A in this manuscript.

      The size and overlap of peaks depend on the length of the SNS. In our study, the width of the peaks corresponds to the size of the short nascent strands (0.5–2.5 kb) chosen as the starting material, whereas the width of the peaks in Pratto et al., Cell, 2021 are much larger (few kb). This could be due to the longer SNS used in the Pratto et al. study. Consequently, the overlap of the longer SNS is more pronounced since the SNS fibres elongate in both directions: at the 3′ end by DNA polymerase and at the 5′ end by ligation of Okazaki fragments. Additionally, the genomic regions displayed in our Figure 2A and in Pratto et al, Figure 1B are presented at substantially different resolutions, with a roughly ten‑fold difference in scale.

      Further, I have some minor concerns that do not affect the main conclusions of the manuscript:

      - Fig 2C: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions. This data would be better presented with all regions stretched to the same size. This has not been addressed in the revision.

      As the reviewer suggested, we have produced scaled plots of the stranded SNS-seq origins over genic and intergenic regions (see Figure 3, which is attached along with the Reviewer #2 (Recommendations for the authors)). However, we would prefer to keep the unscaled versions in the manuscript and add a note in the text as part of the Version of Record, explaining that the origins are evenly distributed throughout intergenic regions rather than being centred within them.

      - Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length. This has not been addressed in the revision.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding potentially wider origins.

      We'll modify the text as part of the Version of Record.

      Are claims well substantiated?:

      The identification of origins via SNS-seq appears to be incompletely supported to me.<br /> All downstream analyses depend on the reliability of origin identification.<br /> Impact:

      This study has the potential to be valuable for two fields: In research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Further, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eucaryotic model organism.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      - Two other attempts to identify origins in T. brucei - ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154) - were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      - We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      - MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Four broad issues need to be addressed.

      (1) The authors have attempted to test the overlap between ORC1/CDC6 (an ORC subunit) binding in the genome and SNS-seq. If there were an overlap, this would provide evidence that the SNS-seq signals represent origins. However, the analysis provided is inadequate: merely a statement that "we obtained an overlap of 4.2% between origins and ORC1/CDC6 binding sites within a window of {plus minus}2 kb and 6.2% in the window of {plus minus}3 kb". Nowhere are these data shown or properly discussed:

      a) The authors need to provide a diagram showing where in the genome the very small amount of overlapping SNS-seq and ORC1/CDC6 binding occurs, and to clearly show and state how many of the intergenic SNS-seq peaks are sites of ORC1/CDC6 binding. In the absence of such analysis, a key question is unanswered: is there any evidence of ORC1/CDC6 (or ORC more broadly) binding at the SNS-seq signals within the polycistronic transcription units?

      In the original version of the manuscript, these data were already presented as percentages in the text and as a metaplot (Supplementary Fig. 8C).

      We based our analysis on the set of 350 TbORC1/CDC6 binding sites available on TriTrypDB at the time of analysis. This dataset was a filtered subset of the originally reported TbORC1/CDC6 ChIP‑on‑chip peaks (personal communication, TriTrypDB). Since then, the unfiltered dataset has been made available. We therefore re‑analyzed the overlap using this dataset, to which we applied a filtering that yielded 990 binding sites closely matching the 953 sites reported by the McCulloch group. We need to stress here that the original 953 sites reported by the McCulloch group (Tiengwe et al., 2012 PMID: 22840408), is not available anymore and that the authors:

      - do not provide genomic coordinates for the 953 binding sites and

      - do not release any scripts or methodology that would allow independent reproduction of the 953 sites.

      A similar remark also applies to the MFA-seq data (see below).

      To address the reviewer’s request, we have now:

      (1) Recalculated the overlap using the updated TbORC1/CDC6 dataset (990 binding sites) from TriTrypDB.

      (2) Added the absolute number of overlapping SNS‑seq origins and TbORC1/CDC6 binding sites in the Results section for clarity.

      (3) Included the TbORC1/CDC6 binding sites in the chromosomal overview (newly added to Supplementary Fig. 8A), so that their genomic localization relative to SNS‑seq peaks is visually accessible.

      (4) Revised the metaplots of TbORC1/CDC6 distribution around SNS‑seq origins using the updated dataset (Supplementary Fig. 8C).

      With these improvements, we now find that:

      - Within ±2 kb, 12.9% (253) of SNS‑seq origins overlap with 25.6% of TbORC1/CDC6 binding sites.

      - Within ±3 kb, 18.8% (370) of SNS‑seq origins overlap with 37.4% of TbORC1/CDC6 binding sites.

      The updated metaplot shows a clear depletion of TbORC1/CDC6 signal at the origin center, with modest enrichment ~5 kb upstream and downstream. The underlying reason for this pattern remains unknown, and we agree that additional studies will be needed to understand it.

      b) Equally, the authors need to explain what they conclude from this analysis. They make a comparison with T. cruzi ORC1/CDC6 and SNS-seq overlap, which does not illuminate what the data tell us. For instance, if there is no or minimal overlap between ORC1/CDC6 binding and SNS-seq peaks within the polycistronic transcription units, do they conclude that the major SNS-seq signal they detail is evidence for ORC-independent DNA replication? If there is no overlap, what further evidence can they provide that these signals truly are origins?

      First, we would like to clarify that, to date, there is no evidence supporting ORC‑independent DNA replication in T. brucei, and—importantly—no published data demonstrating that TbORC1/CDC6 is universally required for DNA replication initiation. Because of this, we consider that it would be inappropriate to conclude that regions lacking detectable TbORC1/CDC6 signal undergo ORC‑independent initiation. We would prefer not to speculate in the absence of supporting evidence and would gratefully consider any reference the reviewer wishes to provide on this subject.

      Second, the low overlap between TbORC1/CDC6 binding sites and SNS‑seq origins does not, in our view, invalidate our mapping of replication initiation sites. Multiple factors contribute to this:

      (1) Low overlap between ORC1/CDC6 and origin‑mapping techniques has been repeatedly reported across kinetoplastids. For instance, in T. cruzi, 88.2% of origins detected by DNAscent nanopore sequencing showed no overlap with TcORC1/CDC6–Ty1 ChIP signal within ±3 kb, and only 11.7% co‑localized. This is strikingly similar to our observations in T. brucei. Thus, our data are consistent with the broader pattern in trypanosomatids rather than an exception.

      (2) The origin topology detected by stranded SNS‑seq is supported by several genomic characteristic found frequently in other eukaryotes, including:

      - A highly specific and polarized poly(dA)/poly(dT) sequence environment.

      - Strand‑specific G4 structures positioned around origin centers.

      - A conserved nucleosome‑depleted region flanked by well‑positioned nucleosomes.

      These features are absent from shuffled controls, appear at high significance, and recapitulate hallmark signatures of replication origins in other eukaryotes.

      Together, these findings give us confidence that the SNS‑seq peaks represent genuine origins - despite the incomplete overlap with TbORC1/CDC6 binding.

      Third, we fully agree with the reviewer that a definitive conclusion would require an additional, independent validation method.

      Given the lack of complete ORC subunit datasets and the unusual biology of trypanosomatid replication complexes, we believe that the cautious interpretation above is the most appropriate.

      c) The authors state (Discussion): "Validation of origins is generally a difficult task, particularly in trypanosomatids, where proteins involved in the initiation of DNA replication are difficult to determine. Few proteins have been described as potential ORC subunits (reviewed in 61), and none of them have been shown to be a specific marker that indicates the origins." There are two problems with the statement. First, most of the subunits of ORC have now been described in T. brucei; the authors should make this clear. Second, mapping of ORC1/CDC6 localisation, contrary to what the authors state here, shows precise correlation with the peaks of every MFA-seq signal described (see Tiengwe et al, Cell Reports, 2012); thus, ORC1/CDC6 binding provides evidence that MFA-seq is detecting origins, something that cannot be said for SNS-seq. The authors need to correct this misleading paragraph.

      As suggested, we have removed the paragraph from the Discussion to avoid confusion. However, we disagree with the reviewer's assessment and clarify below our position regarding the issues raised.

      First, we agree that five candidate ORC subunits have now been identified in T. brucei. Our intention was not to suggest the contrary, but rather to emphasize that, although candidate ORC components have been described, direct functional evidence for their roles in replication initiation is still limited. For this reason, we were cautious in referring to any ORC component as a definitive marker of replication origins.

      Second, regarding the reviewer’s statement that TbORC1/CDC6 binding “shows precise correlation with the peaks of every MFA‑seq signal”, we respectfully disagree based on several observations:

      (1) MFA‑seq does not identify individual origin centers, but rather broad replicated regions that often span hundreds of kilobases. By design, this method cannot define the number or position of discrete origins within each peak. For that reason, MFA-seq regions do not have the resolution required to validate TbORC1/CDC6 binding sites as individual origins.

      (2) In the published datasets (Tiengwe et al., Devlin et al.), no metaplots or locus‑wide quantification of the overlap between MFA‑seq peaks and TbORC1/CDC6 binding were provided. The coordinates or the approach used to define the discrete regions that they define as the originsin the MFA‑seq broad peaks have never been described or made available, making it difficult to evaluate the claimed correspondence.

      (3) Notably, McCulloch’s group later reported that only 4.4% of the 953 TbORC1/CDC6 sites overlapped with their 42 MFA‑seq “origins”, underscoring that the degree of correspondence is in fact limited (PMID: 29491738).

      (4) Finally, as noted in our response to point (1b), low overlap between ORC1/CDC6 binding sites and origin‑mapping techniques is a consistent observation across kinetoplastids, including T. cruzi, where DNAscent‑mapped origins show only ~12% overlap with TcORC1/CDC6 ChIP signals. This suggests that the limited overlap we observe is not unique to our dataset.

      For these reasons, we are not convinced that the TbORC1/CDC6 binding sites have been shown to align precisely with MFA seq peaks, nor that these datasets definitively validate origin mapping in T. brucei. Nevertheless, to avoid over‑interpretation and potential confusion, we have removed the paragraph from the Discussion as requested. We hope this clarifies our position and improves the accuracy and neutrality of the manuscript.

      (2) Like for ORC1/CDC6 localisation, the authors' evaluation of the relationship between MFA-seq and SNS-seq mapping is inadequate, and the depth of the analysis and discussion needs to be improved:

      a) The authors state: "We found 28-42% stranded SNS-seq origins overlapped with early and 43-55% overlapped with late S-phase MFA-seq replicated regions (Supplementary Figure 8B)." This seems important and provides (limited) validation of both datasets, but cannot be discerned from the supplied figure. Please provide a metaplot of the two datasets centred on the MFA-seq loci, including the SNS-seq peak amplitude.

      We would like to emphasize that MFA‑seq is not a method designed to map individual origins, and this fundamentally limits the interpretability of metaplots centered on MFA-seq regions. MFA‑seq identifies broad replication‑enriched domains, typically spanning 100–500 kb, within which multiple origins may fire asynchronously across the cell population.

      This concern is reinforced by the original MFA‑seq publications (Tiengwe et al., 2012; Devlin et al., 2016), which:

      - do not provide positional data for the 42-47 MFA‑inferred origins,

      - do not describe the computational method used to derive individual origin coordinates from the broad peaks, and

      - do not release any scripts or methodology that would allow independent reproduction of the claimed origin positions.

      Because of this, it is not possible to reconstruct or validate how the 42 MFA‑seq “origin” sites were defined, nor to use those coordinates as anchors for metaplot analyses.

      Most importantly, we disagree with the underlying assumption that each MFA‑seq peak corresponds to exactly one origin. This assumption runs counter to the principle of the technique, which identifies regions of higher DNA content in replicating cells than in non-replicating cells; it is also contradicted by our stranded SNS‑seq data and by DNA combing measurements:

      - SNS‑seq detects multiple discrete origins within the same genomic regions that produce a single broad MFA‑seq peak.

      - DNA combing reveals inter‑origin distances of ~36–422 kb (median ~150 kb) (PMID: 26976742), which is far shorter than the ~400–600 kb replication domains identified by MFA‑seq.

      - Furthermore, with only 42 origins detected by MFA-seq, it is not possible to achieve complete genome replication in T. brucei during S-phase. DNA combing has found that the average speed of replication forks in the procyclic forms is 1.9 Kb/min. (PMID: 26976742). Dividing the size of the Trypanosoma brucei brucei TREU927 genome (26.1 Mb) by 42 origins (PMID: 22840408) shows that 621 Kb must be replicated during the S phase. Using the calculated average replication speed of 1.9 Kb/min, we can estimate that the replication of 621 Kb would take 327 min (5.45 hours) (621 Kb/1.9 Kb/min = 327 min). However, this exceeds the estimated length of the S-phase in these parasites, which is 2.31 hours (138.6 minutes) (PMID: 32397111, 31811174, 28258618) or less, 1.36 hours (PMID: 2190996, 10574712) in Trypanosoma brucei procyclic forms. Therefore, more than 42 origins are necessary to complete replication during the short S phase.

      This makes it unlikely that MFA-seq regions represent single functional origins. For these reasons, a metaplot centered on MFA‑seq “loci” may lead to misinterpretations and would not provide biologically meaningful information.

      We hope that the expanded explanation clarifies our interpretation of the relationship between these two complementary, but fundamentally different, methods.

      b) The authors state that "Our results showed that the origins are predominantly located in the intergenic regions within the PTUs (Figure 2C)'. This finding cannot be discerned from this figure, which does not show 'strand switch regions' (SSRs; transcription start/stop sites), where MFA-seq predicts all origins to localise. The authors need to acknowledge this difference and must show a comparison of SNS-seq data, including peak amplitude, around all SSRs (whether predicted by MFA-seq to act as origins or not, since all appear to bind ORC1/CDC6).

      We have now provided the metaplots showing the overlap between stranded SNS-seq origins and SSRs (see Supplementary Figure 8D). This difference has been acknowledged and discussed in the revised manuscript.

      c) Finally, the authors' interpretation that around 30-55% of SNS-seq peaks overlap with MFA-seq 'origins' is highly questionable. MFA-seq peaks are regions of increased DNA content in replicating cells relative to non-replicating cells, and so the entire region under the MFA-seq peak is not necessarily an origin, but is likely to be a more discrete locus (eg, the SSR, where ORC1/CDC6 mainly localises). They should correct the wording and discuss what significance they see in this overlap; for instance, do they think SNS-seq 'clusters' are more pronounced within the MFA-seq peaks and, if so, what might this mean, and why does it not correlate with ORC1/CDC6 localisation?

      As the reviewer notes, ‘MFA‑seq peaks are regions of increased DNA content, and so the entire region under the MFA-seq peak is not necessarily an origin but is likely to be a more discrete locus’. This is exactly why MFA‑seq is inappropriate for identifying discrete/individual origins: within these replicated domains, multiple origins can fire, as revealed both by stranded SNS‑seq mapping.

      Regarding the overlap between SNS‑seq origins and MFA‑seq peaks, we agree with the reviewer that this overlap should not be interpreted as validating MFA‑seq “origin positions.” Instead, we now describe it more accurately as the proportion of discrete SNS‑seq origins that fall within broader MFA‑seq replication domains. This is expected, because SNS‑seq identifies individual initiation events, whereas MFA‑seq identifies S‑phase replication domains averaged across a population. Our stranded SNS‑seq data do not show enhanced origin accumulation within MFA-seq regions, and we find no correlation with TbORC1/CDC6 positions. This is now discussed.

      Regarding SSRs, we do not share the view that they should be considered privileged initiation sites. After remapping the TbORC1/CDC6 ChIP‑on‑chip dataset (see above) to the T. brucei Lister 427–2018 genome (Supplementary Fig. 8A), we observed that TbORC1/CDC6 binding is distributed throughout the chromosomes, not restricted to SSRs. To quantify this, we analyzed the overlap between TbORC1/CDC6 sites and all annotated SSR classes (dSSRs, cSSRs, and head‑to‑tail regions, as defined in Kim et al. 2009). The results show that:

      Only 10% of TbORC1/CDC6 binding sites fall within 40% of all SSRs.

      At the level of individual SSR types:

      - TTS: 3.3% of TTS overlap with 0.3% of TbORC1/CDC6 sites.

      - TSS: 67% of TSS overlap with 6.1% of TbORC1/CDC6 sites.

      - Head‑to‑tail regions: 54.2% overlap with 3.6% of TbORC1/CDC6 sites.

      These analyses demonstrate that most TbORC1/CDC6 sites are not located at SSRs, contradicting the idea that SSRs represent primary or exclusive origin sites.

      Author response image 1.

      Overlap between TbORC1/CDC6-12Myc binding sites (Tiengwe 2012, Cell Reports) and strand‑switch regions (SSRs). Venn diagram showing the overlap of 990TbORC1/CDC6-12Mycbinding sites (Retrieved from TritrypDB filtered at score 22 to achieve a number of binding sites similar to the one (953 binding sites) published in Tiengwe 2012, Cell Reports) and SSR sites in the genome (Kim 2018, NAR). The intersection shows that 10.3% of Orc1/CDC6 binding sites overlap with 41.8% SSRs. The intersection is subdivided into TSS (orange), TTS in (blue) and HT in (green).

      (3) A key objection to the data presentation is the decision to limit SNS-seq mapping to the intergenic regions. In addition to overlooking the SSRs (see above, 2), so-called subtelomeres, which account for nearly 50% of the T. brucei genome and are largely untranscribed, are not shown or discussed at all. Providing this data will improve clarity and also provide a key test of one of the predictions that the authors make: "most origins are localized in actively transcribed regions, which could lead to collisions between DNA replication and the transcription machinery. This spatial coincidence implies that transcription and replication must occur in a highly ordered and cooperative manner in T. brucei."

      We do not understand why this reviewer concluded that we took 'the decision to limit the mapping of SNS-seq to intergenic regions'. This is a factual error.

      To be clearer,

      (2) We now explicitly present the distribution of SNS‑seq origins across core and subtelomeric regions in the revised Figure 2D, making clear that origin mapping was performed genome‑wide.

      (2) And that SNS‑seq origins are also present in subtelomeric regions. We have revised the manuscript to avoid any implication that origin firing is restricted only to actively transcribed regions. Our data show that most SNS‑seq origins lie within intergenic regions of PTUs, but a minority are found outside these regions—including subtelomeres and SSRs. The revised text reflects this nuance and highlights that the spatial relationship between transcription and replication is strong but not exclusive.

      These additions undoubtedly ensure that the genomic-wide nature of SNS-seq analysis is transparent to the reader and should therefore remove this reviewer's “key objection”.

      a) The authors must show SNS-seq mapping to the subtelomeres (in addition to around the SSRs; see comment (2). If no SNS-seq peaks are detected in the subtelomeres, what do the authors conclude about how the genome is duplicated? If SNS-seq peaks are detected in the subtelomeres, do they correspond with the ordered nucleosomes in this part of the genome described by Maree et al (PMID: 28344657); if so, might SNS-seq signal localisation not be directed by transcription but chromatin?

      We have now presented the proportion of origins in subtelomeric regions (see Figure 2B).

      As illustrated in the metaplots in Author response image 2, the distribution of nucleosomes around the subtelomeric origins is similar to the distribution shown for all origins in the manuscript. We do not see the pattern of nucleosomes as described by Maree et al (PMID: 28344657) over ORC1/CDC6 binding sites in this part of the genome.

      Author response image 2.

      Metaplots showing the mean nuclesome signal over centred SNS-seq origins in subtelomeric regions. Two replicates from Maree et al 2019 (PMID: 28344657).

      We never claimed that transcription directs the localisation of the SNS-seq signal. We did not conduct experiments to address this issue. In contrast, we consider that the organisation of chromatin exerts a significant influence on the selection of active origins.

      (4) The major conclusion of the manuscript is that the SNS-seq signal corresponds very precisely to the locations of RNA-DNA hybrids (R-loops). Given all the limitations discussed above, can the authors rule out the possibility that SNS-seq is merely mapping DNA-DNA hybrids and is not, in fact, detecting origins?

      a) It is legitimate to speculate about the possibility that the very extensive overlap between SNS-seq and DRIP-seq signals within polycistronic transcription units (between ORFs) might suggest that DRIP-seq data detects nascent strands at replication origins, rather than R-loops at sites of pre-mRNA processing, as previously suggested by Briggs et al (PMID: 30304482). (eg, 'we disclosed for the first time a strong link between R-loop formation and DNA replication initiation'; 'The RNA:DNA hybrids are formed at initiation sites by RNA priming of SNS and Okazaki fragments'). However, the authors should acknowledge that alternative explanations for the localisation and potential functions of inter-CDS R-loops have been suggested,

      We do not find extensive overlap between stranded SNS-seq and DRIP-seq signal. We have observed only a minor proportion (1.7%) of the previously reported DRIP-seq signal to overlap with the origins detected by stranded SNS-seq. The RNA-primed SNS must form RNA:DNA hybrids during the initiation of DNA replication, and that an enrichment of these hybrids around the origins is expected. Therefore, we legitimately speculated that this minor proportion of RNA:DNA hybrids enriched around origin centres could be due to the origin activation.

      We agree that some of the DRIP-seq signals detected around the origins may be sites of pre-mRNA processing, as previously suggested by Briggs et al. (PMID: 30304482). Since there is no data proving implication of pre-mRNA processing into DNA replication initiation we prefer not to speculate about it.

      b) More importantly, the authors should provide experimental evidence that tests such a mechanistic prediction of R-loops and origins: for instance, have they attempted to remove R-loops, eg, by treatment with RNase H, and checked that the SNS-seq signal is unaltered? In the absence of such data, they cannot exclude the possibility that their work has revealed an overlooked problem with SNS-seq (which may not be limited to T. brucei; are matched DRIP-seq and SNS-seq datasets available to correlate these signals in a range of organisms?).

      We have not attempted RNase H treatment for a fundamental methodological reason: it seems highly improbable that RNA:DNA hybrids would persist through the multiple denaturation steps inherent to the SNS‑seq enrichment protocol. Published biophysical measurements show that RNA:DNA hybrids melt at ~95 °C (Roberts & Crothers, Science, 1992; PMID: 1279808), which is the temperature repeatedly applied during SNS isolation. Under these conditions, persistent RNA:DNA hybrids cannot remain intact and therefore cannot be responsible for the SNS‑seq peaks detected.

      We do not interpret our findings as revealing an “overlooked problem with SNS‑seq.” Instead, we consider that the enrichment of RNA:DNA hybrids around origins observed in DRIP‑seq is biologically meaningful and expected, given that replication initiation involves RNA‑primed nascent strands and that DRIP‑seq detects such structures.

      Reviewer #2 (Recommendations for the authors):

      I have some minor concerns that do not affect the main conclusions of the manuscript:

      (1) Figure 2B: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions.

      That is correct. The regions displayed in the heatmaps are genic and intergenic region sorted by size. We did not want to convey with this metaplot that the origins are accumulating at the centres of the intergenic region but mainly that genic regions are mostly devoid of origins and the intergenic regions enriched in origins.

      (2) Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding origins that are potentially wider. Nevertheless, the violin plot shows that the majority of origins are shorter than 500 nt. In the end, the size of regions detected as the origin is not important. What gives the resolution of stranded-SNS-seq is the ability to identify the centre of the origin between the minus and plus peaks.

      (3) Data in the manuscript were sometimes not presented in an easy-to-read manner. In some cases, this was due to benign things, such as missing labels for the mean frequency plots (e.g., Figure 2B, blue and green) or very small fonts for axes (Figure 2B). Sometimes, due to the plot types that were chosen, such as pie-charts (Figure 2C, see https://medium.com/analytics-vidhya/dont-use-pie-charts-in-data-analysis-6c005723e657), stacked bar plots (Figure 6B), or showing cumulative distributions (Figure 5C, and Figure 2D) it makes it difficult to judge the actual distribution.

      Wherever possible, the size of the small fonts was increased to the maximum. Missing labels were added to the mean frequency plots. We increased the font size for the axes in the frequency plots.

      However, we found cumulative distributions useful. If you have a more specific proposal for replacing cumulative distributions, we would be very grateful to hear it. We also hope that magnifying the figures in TIFF format with a higher resolution will improve visibility.

      (4) Figure 2B: This data would be better presented with all regions stretched to the same size (the reason is explained in the public review).

      We performed the scaled plots for the stranded SNS-seq origins over the genic and intergenic regions as the reviewer suggested (see Author response image 3), but we prefer to keep the unscaled versions in the manuscript.

      Author response image 3.

      Distribution of mapped origins in scaled genic and intergenic regions. Scaled heatmaps present the distribution of the mapped origins and shuffled controls within scaled genic and intergenic regions (± 2 kb).

      (5) Line 149: "The number of origins in both cells was 148 compared using normalised mapped reads": Supplementary Figure 2D mentions that conditions were subsampled to the same amount. I would mention that explicitly in the main text ("compared using normalized, subsampled mapped reads"), as 'normalizing' would not include 'subsampling' for me. Also, I could not find the methods section that the authors refer to here.

      Thanks for the suggestion. We changed the text to make this point clearer. In the methods section, the subsampling process was referred to as 'PCF down-sampling', but we changed now the name to 'Read sub-sampling' to be more consistent in the edited version of the manuscript.

      (6) Figure 2C: I struggled to understand what gDNA stands for. Maybe it could be replaced with something like distribution in genome?

      Thanks for this suggestion. It is changed to ‘distribution in genomic sequence’.

      (7) Figure 5C: I cannot see how a G4 30 kb from an origin could be relevant. This also does not fit the scale of the author's own model at all (Figure 8).

      The main goal of Figure 5C was to demonstrate the differences between origins and the nearest G4s compared to the shuffled controls. The graph shows that 50% of the origins have a G4 within 2010 bp, whereas the median for the shuffled control is 4154 bp in the case of non-stabilised G4s. Our model is based on Figure 5D, which illustrates the enrichment of G4s and poly(dA) around the centre of origins.

      (8) Figure 6B: could be made supplementary in my opinion. All relevant data is repeated in panel D.

      It is true that Figures 6B and 6C contain some repetition. However, we would prefer to keep Figure 6B because it provides a quantification of the six indicated categories, along with the statistical tests. Figure 6B only presents the three categories that changed significantly. Figure 6D shows distribution but does not contain quantified data.

      (9) Figure 6D: This plot is repeating a lot, within single figures (Figure 6A, top) but also between figures (e.g., Figure 5D, Figure 4B). I'd prefer it if the initial plots of each figure were expanded a bit (here Figure 6A, top) to include some information from the previous figures. Then all these summary plots could be combined into a single figure at the very end (maybe still as different panels to reduce the number of lines in a single plot). Otherwise, each summary plot repeats the tracks of the previous, which becomes very repetitive.

      Our model is based on these summary plots, and we calculated the relative distances between the different elements using them. Two elements were repeated in each plot: the positions of poly(dA) and G4s. These two elements served as reference points to determine the relative positions of the other elements. Following your suggestion would result again in repetitive summary plots at the end, as one combined summary plot would be overloaded with lines and difficult to understand.

      (10) Figure 6D & Figure 7C: Both show predicted G4s; however, on the plus strand, one prediction has a two-peaked shape, the other only a single peak. Is this a mistake?

      The graphs for the predicted G4s do not have the same shape in the two plots as they were performed in different reference genomes for T. brucei. Figure 6C is in the 427-reference genome as the MNase-seq data set was analysed in this reference genome and we re-did the SNS-seq analysis and the G4 prediction in this reference genome to be able to compare them directly. In Figure 7C we are comparing origins DRIP-seq and predicted G4s, in this case all datasets could be compared in the 427-2018 reference genome.

    1. eLife Assessment

      This important work provides a new method to extract cfDNA from residual plasma from heparin separators for molecular testing. The evidence supporting the authors' claims is convincing, although some further metrics should also be evaluated. This finding will be interesting to people working in epigenomics and infectious disease diagnostics.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, and likely to benefit the research community.

      Comments on revisions:

      The concerns raised have been addressed. The heparin separator-based cfDNA method described in this study is likely to benefit the research community. I have no further scientific concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      Comments on revisions:

      As suggested previously, the Pearson correlation analysis tends to be overstated; please replace it with Spearman correlation in the whole manuscript. Currently, the authors include both of them in the abstract, method, results, and graphics, all of which are required to be updated to only use Spearman correlation results.

      I don't have other concerns about the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, but in its current form, the study feels incomplete and requires additional work.

      We thank the reviewer for the encouragement and for recognizing the potential of clinical chemistry plasma as an accessible source for cfDNA-based analyses. To address concerns about incompleteness, we conducted additional controlled experiments and a more thorough literature review.

      My major concerns/suggestions are as follows:

      (1) Context and literature

      The introduction provides only limited background on prior attempts to use heparinized plasma for cfDNA work. It is well known that heparin can inhibit PCR and sequencing library preparation, which has historically discouraged its use. The authors should summarize the relevant literature more comprehensively and explain clearly why this approach has not been widely adopted until now, and how their work differs from or overcomes these earlier challenges.

      Thank you, we agree that the review of prior work requires expansion. In the revised manuscript, we expanded the introduction to focus on prior studies and their gaps (lines 53-80).

      (2) Genome-wide coverage

      The analyses focus on correlations in methylation patterns and fragmentation metrics, but there is no evaluation of sequencing coverage across the genome. For both WGS and WMS, it would be important to demonstrate whether cfDNA from heparin plasma provides unbiased coverage, or whether certain genomic regions are systematically under-represented. A comparison against coverage profiles from cell-derived DNA (e.g., PBMC genomic DNA) would help to put the results in context and assess whether the material is suitable for whole-genome applications.

      Thank you for raising this point. We agree that genome-wide coverage distributions should be evaluated alongside correlations in methylation and fragmentation metrics when assessing the effects of sample tube types.

      To address this, we pooled the five healthy subjects in the Tube Comparison Study by tube type to generate two high-depth reference BAMs (EDTA vs. heparin separator). We calculated the mean depth per 1Mb bin across Chr1-22 and normalized with z-score. Overall, the heparin separator samples showed coverage profiles comparable to the matched EDTA samples (Pearson’s r = 0.9988, Spearman’s ρ = 0.9994). The figure has now been added as Supplementary Figure 1.

      Also appreciate the suggestion to compare against gDNA. However, cfDNA and gDNA are expected to exhibit different coverage patterns because cfDNA undergoes non-random fragmentation during its generation and degradation, which makes a direct cfDNA–gDNA comparison difficult to interpret in terms of tube-related bias.

      (3) Viral detection sensitivity

      The study shows strong concordance in viral detection between EDTA and heparin samples, but the sensitivity analysis is lacking. For clinical relevance, it is critical to demonstrate how well heparin-derived plasma performs in low viral load cases. A quantitative comparison of viral read counts and genome coverage across tube types would strengthen the conclusions.

      We agree that evaluating low viral loads is important for test development. While our goal is to evaluate the repurposing of residual plasma from the heparin separator, rather than to establish the analytical sensitivity, we recruited additional paired cases (n=4) together with viral reads below 10 RPM from existing cases (n=12) and examined the correlation of viral read counts between EDTA and heparin separators in this subset. As shown in Author response image 1, viral RPM is strongly correlated between tube types (Pearson’s r = 0.93, P < 0.0001), supporting that the heparin-derived plasma yields quantitatively consistent viral reads relative to EDTA samples. We have updated our sample sheet in Supplementary Table 1 and Fig. 3 accordingly.

      Author response image 1.

      Viral load correlation in cases below 10 RPM

      Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      We thank the reviewer for the insightful comments. In the revised manuscript, we added controlled experiments specifically designed to address the concerns regarding cfDNA degradation. We have also addressed other concerns in the responses below.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      We thank the reviewer for the encouraging comment.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      Thank you for this important point. We have expanded the introduction to include a thorough review of relevant prior studies (lines 53-80).

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      The concern about degradation is very reasonable based on the literature. In the revised manuscript, we added a controlled experiment mimicking the real-world clinical specimens unprocessed at room temperature.

      In the controlled experiment with delayed processing, paired EDTA and heparin separator tubes from the same blood draw from 6 volunteers were processed with the first soft spin (1600g 10min) after room temperature or 4°C delays (0, 1, 3, and 24 hours) to simulate the real-world delayed processing at the inpatient hospital setting, and then the original tubes were kept in 4°C for a week before the second spin (16000g 10min) to simulate the delayed processing at the research laboratory (Fig. 2). This simulation cannot mimic the outpatient or remote clinic setting that requires transportation. Therefore, we noted this caveat in the Discussion and Abstract.

      From our results, EDTA samples remained largely stable across all test settings (Author response image 2). In contrast, heparin separator tubes held at room temperature showed a clear time-dependent shift in fragmentation, with the most pronounced degradation at 24 hours. Importantly, heparin separator samples processed within a short pre-centrifugation window (for example, within 3 hours) and maintained refrigerated thereafter showed only minimal changes relative to the time 0 controls (Author response image 3). We have updated the Discussion to emphasize this short window plus refrigeration condition as a practical boundary for fragmentomics in heparin separator tubes.

      We addressed the work of Barra et al. (2025, LabMed) in the introduction. In that study, whole blood in heparin tubes was first soft spun and then incubated at 37°C for 24 hours, leading to severe DNA fragmentation. Our data agrees: two matched 37°C, 24-hour pairs of samples produced similar severe fragmentation in heparinized blood (Author response image 4). However, this is not representative of routine (Stanford/UCSF) clinical transport and processing. We revised the manuscript to emphasize that heparin separator tubes are most suitable for downstream cfDNA fragmentomic analyses when the pre-centrifugation interval is minimized and samples are maintained refrigerated before processing whenever feasible.

      Author response image 2.

      Size distribution and end motif rank concordance in EDTA tubes across conditions. Left panels show fragment size distributions. The right panels show the corresponding scatter plots comparing end-motif abundance rankings between conditions. E0, EDTA processed immediately; E4T24, EDTA incubated at 4°C for 24 h; ERT24, EDTA incubated at room temperature for 24 h.

      Author response image 3.

      Size distribution and end motif rank concordance in Heparin separators across conditions. Left panels show fragment size distributions. The right panels show scatter plots comparing end-motif abundance rankings between conditions. H0, heparin processed immediately; H4T1/H4T3/H4T24, heparin incubated at 4°C for 1, 3, or 24 h; HRT1/HRT2/HRT3/HRT24, heparin incubated at room temperature for 1, 2, 3, or 24 h.

      Author response image 4.

      Size distribution and end motif rank concordance in extreme incubation conditions. Left panels show fragment size distributions. The right panels show scatter plots comparing end-motif abundance rankings between conditions. H0, heparin processed immediately; H37T24, heparin incubated at 37°C for 24 h.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      We appreciate the reasonable concerns regarding cfDNA degradation and agree that the methylation profile is not a metric for degradation. This point regarding measuring degradation is addressed with new experiments and in our above response to comment (2). We appreciate the suggestion to use Spearman correlation, and we have now incorporated Spearman’s ρ into the updated figures.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      We appreciate the reviewer’s constructive comments regarding the CNV analysis. We added an analysis using 50kb as the bin size (data uploaded to Zenodo). Across matched CNV-positive samples, the CNV patterns remained consistent across tube types, while the expected higher noise was observed. We did not extend the bin size to 1-10kb because at ~5x coverage, such resolution would mainly be noise, rendering the results uninterpretable for CNV calling.We agree that illustrative examples alone are insufficient and that quantitative measures are required. To address this concern, we evaluated concordance across all paired cases by measuring the copy ratio and calculating the Spearman correlation (Fig. 4b). CNV-positive samples had high concordance (n = 6, Spearman’s ρ=0.72-0.96) between tube types and were used primarily for interpretation. Low correlations in CNV-negative samples are not unexpected and were not used for interpretation. In these samples, log2 ratios across all bins cluster tightly around zero in both tube types. Correlation coefficients are highly sensitive to minor fluctuations, thus not informative of biological concordance.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      We appreciate this suggestion. We used another popular and independent CNV caller, CNVkit, in addition to ichorCNA. Although both methods use sequencing depth, they differ in their segmentation algorithm. ichorCNA uses a hidden Markov model-based segmentation optimized for low-pass cfDNA WGS, whereas CNVkit uses circular binary segmentation by default and works well with targeted panels. The CNVkit results are also consistent across different tube types. We have added the CNVkit results to Supplementary Fig. 3.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      We thank the reviewer for highlighting somatic SNV detection as an important cfDNA application. Robust SNV benchmarking typically requires larger plasma input and substantially deeper, targeted sequencing than is feasible with remnant chemistry specimens. In routine workflows, chemistry testing leaves only ~0.5–2 mL residual plasma per tube, which limits the achievable depth for sensitive SNV calling. We have added this limitation to the Abstract and the Discussion (lines 281-285) and clarified that our goal is to repurpose heparin separator residual plasma as a complementary resource to expand biobanking, rather than to replace collection protocols optimized for mutation testing.

      Reviewer #2 (Recommendations for the authors):

      The manuscript does not seem to have been edited thoroughly prior to submission. For example, at lines 94-97, the line spacing is double, which is apparently different from the other surrounding lines. In addition, Figure 5a contains a wrong label of "|y=x" at its top. Figure 5b strongly suggests that Spearman, but not Pearson correlation, should be appropriate for the analysis.

      We thank the reviewer for carefully noting these formatting and labeling issues. Corrections for all points are made in the revised version.

    1. eLife Assessment

      Ge et al here report a structural study of the native tripartite multidrug efflux pump complexes from Escherichia coli that identifies a novel accessory subunit, YbjP, the structure of the native TolC-YbjP-AcrABZ complex, as well as structures of the AcrB protein in L, T, and O conformations. The strength of the structural data is compelling, and the importance of the findings is potentially fundamental. In the revised manuscript, the authors have included additional analysis and made comparisons with pre-existing data which has helped place the data and its impact in the proper context.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the biological mechanism underlying the assembly and transport of the AcrAB-TolC efflux pump complex. By combining endogenous protein purification with cryo-EM analysis, the authors show that the AcrB trimer adopts three distinct conformations simultaneously and identify a previously uncharacterized lipoprotein, YbjP, as a potential additional component of the complex. The work aims to advance our understanding of the AcrAB-TolC efflux system in near-native conditions and may have broader implications for elucidating its physiological mechanism.

      Strengths:

      Overall, the manuscript is clearly presented, and several of the datasets are of high quality. The use of natively isolated complex is a major strength, as it minimizes artifacts associated with reconstituted systems and enables the discovery of a novel subunit. The authors also distinguish two major assemblies-the TolC-YbjP sub-complex and the complete pump-which appear to correspond to the closed and open channel states, respectively. The conceptual advance is potentially meaningful, and the findings could be of broad interest to the field.

      Weaknesses:

      (1) As the identification of YbjP is a key contribution of this work, a deeper comparison with functional "anchor" proteins in other efflux pumps is needed. Including an additional supplementary figure illustrating these structural comparisons would be valuable.

      (2) The observation of the LTO states in the presence of TolC represents an important extension of previous findings. A more detailed discussion comparing these LTO states to those reported in earlier structural and biochemical studies would improve the clarity and significance of this point.

      Comments on revisions:

      In the revision, the authors have addressed the above concerns to improve this study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports the high-resolution cryo-EM structures of the endogenous TolC-YbjP-AcrABZ complex and a TolC-YbjP subcomplex from E. coli, identifying a novel accessory subunit. This work is an impressive effort that provides valuable structural insights into this native complex.

      Strengths:

      (1) The study successfully determines the structure of the complete, endogenously purified complex, marking a significant achievement.<br /> (2) The identification of a previously unknown accessory subunit is an important finding.<br /> (3) The use of cryo-EM to resolve the complex, including potential post-translational modifications such as N-palmitoyl and S-diacylglycerol, is a notable highlight.

      Weaknesses:

      (1) Clarity and Interpretation: Several points need clarification. Additionally, the description of the sample preparation method, which is a key strength, is currently misplaced and should be introduced earlier.<br /> (2) Data Presentation: The manuscript would benefit significantly from improved figures.<br /> (3) Supporting Evidence: The inclusion of the protein purification profile as a supplementary figure is essential. Furthermore, a discussion comparing the endogenous AcrB structure to those obtained in other systems (e.g., liposomes) and commenting on observed lipid densities would strengthen the overall analysis.

      Comments on revisions:

      In the revision, all my concerns have been addressed.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the biological mechanism underlying the assembly and transport of the AcrAB-TolC efflux pump complex. By combining endogenous protein purification with cryo-EM analysis, the authors show that the AcrB trimer adopts three distinct conformations simultaneously and identify a previously uncharacterized lipoprotein, YbjP, as a potential additional component of the complex. The work aims to advance our understanding of the AcrAB-TolC efflux system in near-native conditions and may have broader implications for elucidating its physiological mechanism.

      Strengths:

      Overall, the manuscript is clearly presented, and several of the datasets are of high quality. The use of natively isolated complexes is a major strength, as it minimizes artifacts associated with reconstituted systems and enables the discovery of a novel subunit. The authors also distinguish two major assemblies-the TolC-YbjP sub-complex and the complete pump-which appear to correspond to the closed and open channel states, respectively. The conceptual advance is potentially meaningful, and the findings could be of broad interest to the field.

      Weaknesses:

      (1) As the identification of YbjP is a key contribution of this work, a deeper comparison with functional "anchor" proteins in other efflux pumps is needed. Including an additional Supplementary Figure illustrating these structural comparisons would be valuable.

      We have expanded the comparative analysis between YbjP and established anchoring or accessory components in other efflux pumps, and we have added Supplementary Figure S3 to illustrate these structural relationships.

      (2) The observation of the LTO states in the presence of TolC represents an important extension of previous findings. A more detailed discussion comparing these LTO states to those reported in earlier structural and biochemical studies would improve the clarity and significance of this point.

      In the revised manuscript we have expanded our discussion of the LTO conformations, including a direct comparison with previously reported structural and biochemical observations, to better contextualize the significance of our findings.

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports the high-resolution cryo-EM structures of the endogenous TolC-YbjP-AcrABZ complex and a TolC-YbjP subcomplex from E. coli, identifying a novel accessory subunit. This work is an impressive effort that provides valuable structural insights into this native complex.

      Strengths:

      (1) The study successfully determines the structure of the complete, endogenously purified complex, marking a significant achievement.

      (2) The identification of a previously unknown accessory subunit is an important finding.

      (3) The use of cryo-EM to resolve the complex, including potential post-translational modifications such as N-palmitoyl and S-diacylglycerol, is a notable highlight.

      Weaknesses:

      (1) Clarity and Interpretation: Several points need clarification. Additionally, the description of the sample preparation method, which is a key strength, is currently misplaced and should be introduced earlier.

      We have reorganized the text to introduce the sample preparation strategy earlier and clarify the points that may cause ambiguity.

      (2) Data Presentation: The manuscript would benefit significantly from improved figures.

      We agree and have revised the figures to improve clarity, consistency, and readability. Additional schematic illustrations have been included.

      (3) Supporting Evidence: The inclusion of the protein purification profile as a supplementary figure is essential. Furthermore, a discussion comparing the endogenous AcrB structure to those obtained in other systems (e.g., liposomes) and commenting on observed lipid densities would strengthen the overall analysis.

      We appreciate these suggestions. We added the purification profile to Supplementary Figure S1 and expanded the comparison between our endogenous AcrB structure and previously reported structures from reconstituted systems, including a more detailed discussion of lipid densities.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "Structural mechanisms of pump assembly and drug transport in the AcrAB-TolC efflux system" by Ge et al. describes the identification of a previously uncharacterized lipoprotein, YbjP, as a novel partner of the well-studied Enterobacterial tripartite efflux pump AcrAB-TolC. The authors present cryo-electron microscopy structures of the TolC-YbjP subcomplex and the complete AcrABZ-TolC-YbjP assembly. While the identification and structural characterization of YbjP are potentially novel, the stated focus of the manuscript-mechanisms of pump assembly and drug transport - is not sufficiently addressed. The manuscript requires reframing to emphasize the principal novelty associated with YbjP and significant development of the other aspects, especially the claimed novelty of the AcrB drug-efflux cycle.

      Strengths:

      The reported association of YbjP with AcrAB-TolC is novel; however, a recent deposition of a preceding and much more detailed manuscript to the BioRxiv server (Horne et al., https://doi.org/10.1101/2025.03.19.644130) removes much of the immediate novelty.

      Weaknesses:

      While the identification of YbjP is novel, the authors do not appear to acknowledge the precedence of another work (Horne et al., 2025), and it is not cited within the correct context in the manuscript.

      We thank the reviewer for raising this important point regarding the independent nature of our work.

      Our study indeed progressed independently. The process began with our purification of an endogenous protein sample containing the AcrAB-TolC efflux pump. During our cryo-EM analysis, we observed an unassigned density in the map, for which we built a preliminary main-chain model. A subsequent search of structural databases, including AlphaFold predictions, allowed us to identify this density as the protein YbjP. It was only after this identification that we became aware of the related preprint by Horne et al. on BioRxiv (Posted March 19, 2025).

      Therefore, our structural determination of YbjP was conducted entirely independently. We fully acknowledge and respect the work by Horne et al. and have already cited their preprint in our manuscript. While their detailed structural data, maps, and coordinates were not publicly available as of March 13, 2026, we have described their findings appropriately. We agree that our manuscript can better reflect this context and will carefully check for any missing citations to ensure that their contribution is properly and clearly acknowledged.

      We also believe that the two studies are mutually complementary and collectively reinforce the emerging understanding of YbjP.

      Several results presented in the TolC-YbjP section do not represent new findings regarding TolC structure itself.

      We agree that the TolC features we describe are consistent with previously reported structural characteristics. However, these observations could only be confirmed in the context of the newly determined TolC–YbjP subcomplex, which was not available prior to this study. We have clarified this point in the revision to avoid overstating novelty.

      The structure and gating behaviour of TolC should be more thoroughly introduced in the Introduction, including prior work describing channel opening and conformational transitions.

      We appreciate this suggestion and agree that a more comprehensive overview of TolC gating and conformational transitions will strengthen the Introduction. We have revised the text to incorporate relevant prior structural and functional studies.

      The current manuscript does not discuss the mechanistic role of helices H3/H4 and H7/H8 in channel dilation, despite implying that YbjP binding may influence these features.

      Thank you for this comment. The primary novel contributions of this manuscript are the identification of YbjP and the structural characterization of AcrB in three distinct states. The discussion of the dilation mechanism, while included because we observed the closed TolC-YbjP state, is a secondary point. In the revised manuscript, we have expanded this discussion as suggested.

      Only the original closed TolC structure is cited, and the manuscript does not address prior mutational studies involving the D396 region, though this residue is specifically highlighted in the presented structures.

      We appreciate the reviewer drawing attention to this oversight. We have added citations to the relevant mutational and mechanistic studies, including those involving the D396 region, and more clearly discussed these findings in relation to our structural observations.

      The manuscript provides only a general structural alignment between the closed TolC-YbjP subcomplex and the open TolC observed in the full pump assembly. However, multiple open, closed, and intermediate conformations of AcrAB-TolC have already been reported. Thus, YbjP alone cannot be assumed to account for TolC channel gating. A systematic comparison with existing structures is necessary to determine whether YbjP contributes any distinct allosteric modulation.

      We agree with the reviewer’s assessment and appreciate the constructive suggestion. In our revised manuscript, we have expanded the structural comparison to include previously reported open, closed, and intermediate AcrAB–TolC conformations. This expanded analysis will more clearly position our findings within the existing structural framework.

      The analysis of AcrB peristaltic action is superficial, poorly substantiated and importantly, not novel. Several references to the ATP-synthase cycle have been provided, but this has been widely established already some 20 years ago - e.g. https://www.science.org/doi/10.1126/science.1131542.

      We thank the reviewer for this comment. We fully acknowledge the foundational studies that established the AcrB functional cycle and its analogy to the ATP-synthase mechanism. While previous work indeed defined the LTO (Loose, Tight, Open) cycle of AcrB, those structures were obtained using AcrB in isolation. In contrast, our endogenous sample, which includes the native constraints of AcrA from above and the presence of AcrZ, reveals conformational changes in the transmembrane and porter domains that differ from those previously reported. We interpret these differences as reflecting a more physiologically relevant mechanism. In our revision, we provided a detailed discussion to contextualize these distinctions within the existing literature.

      The most significant limitation of the study is the absence of functional characterization of YbjP in vivo or in vitro. While the structural association between YbjP and TolC is interesting, the biological role of YbjP remains unclear.

      To explore the potential physiological role of YbjP, we compared the viability of a ΔybjP mutant in the E. coli C600 background with that of the wild-type C600 strain under ciprofloxacin (CIP) stress. However, we did not observe a detectable difference in survival between the two strains under the tested conditions. This result is consistent with the assay reported in the preprint mentioned by the reviewer, although the stress conditions used in that study differ from ours.

      Author response image 1.

      To further address this point, we have added a new Supplementary Figure S3 comparing outer membrane proteins with structural and functional similarities to TolC. As shown in this analysis, many such proteins contain an extracellular loop that appears to help anchor or stabilize them within the outer membrane. Notably, TolC lacks such a loop, whereas YbjP contains a corresponding loop region, suggesting that YbjP may potentially play a role in stabilizing or positioning TolC in the outer membrane.

      While our current experiments did not reveal a clear phenotype under CIP stress, the structural observations still suggest that YbjP may have a physiological role. We have therefore expanded the Discussion to more carefully consider possible functional implications of YbjP and to explicitly acknowledge the limitations of the present study regarding its physiological characterization.

      Moreover, the manuscript does not examine structural differences between the presented complex and previously solved AcrAB-TolC or MexAB-OprM assemblies that might support a mechanistic model.

      We thank the reviewer for this suggestion. We now provide a more detailed comparative analysis with previously reported AcrAB–TolC and MexAB–OprM structures, highlighting both similarities and key differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) To address the probable role of YbjP, performing 3D variability analysis on the sub-complex and the complete complex would help clarify whether YbjP participates in channel opening and closing.

      YbjP does not participate in the opening or closing of the TolC channel. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC. The structural transition between the closed and open states of TolC has been thoroughly reviewed by Alav et al. (Chem. Rev. 2021).

      Although the particles for the two reconstructions were obtained from the same dataset, inspection of the raw micrographs and the corresponding 2D class averages clearly shows that the particles fall into two distinct populations: one containing only the TolC–YbjP sub-complex and the other containing the full AcrABZ–TolC–YbjP assembly. In other words, the particles correspond to two different complexes, distinguished by the absence or presence of the AcrABZ components, rather than representing two conformational states of a single complex.

      Three-dimensional variability analysis (3DVA) is most appropriate for analyzing structural heterogeneity arising from continuous or discrete conformational changes within the same macromolecular assembly. Because the heterogeneity in our dataset primarily reflects compositional differences between two assemblies rather than conformational variability within a single complex, we believe that applying 3DVA would not be appropriate for this dataset.

      (2) In addition to the above points, a few minor revisions would improve clarity and readability. Some of the representative density maps in the supplementary figures could be refined for clarity. Adjusting formatting elements (e.g., dashed line thickness) may improve visual presentation.

      Supplementary Figures S2, S5, and S6 have been redrawn to reduce the excessive thickness of the density map representations for better visualization.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Xiaofei and colleagues report the high-resolution cryo-EM structure of the TolC-YbjP-AcrABZ complex, as well as the structure of a subcomplex containing only TolC and YbjP. Additionally, they identify a previously unidentified accessory subunit that plays a role in the function of this complex. Overall, this represents an impressive effort in determining the complete endogenous complex from E. coli and performing systematic analyses. I have a few questions regarding the manuscript:

      (1) The authors use the term "native" several times (e.g., lines 24, 73, 157, 256) to refer to the complex reported here. This may cause confusion, given the use of detergent to extract endogenous complexes from E. coli. They should consider excluding the possibility that the subcomplex was formed during the purification process. The term "endogenous" should suffice in this context.

      We have replaced “native” with “endogenous”.

      (2) Lines 26-28: The phrase "its protomers" may lead to ambiguity, as it could refer to either YbjP or TolC.

      The sentence has been updated to “…bridging the TolC protomers at their equatorial domain.”

      (3) Lines 50-51: The text suggests that the assembly of AcrA and AcrB triggers TolC's transition from a closed to an open conformation. Please clarify this point.

      The introduction (lines 50-51) has been expanded to describe the assembly of TolC and AcrAB, as well as the gating transition between the closed and open states of TolC.

      (4) Lines 57-59: Using cryo-EM may get the low-to-medium resolution map, but not using low-to-medium resolution cryo-EM.

      The sentence has been changed to … prior studies using crystallography and cryo-EM have revealed low-to-medium resolution snapshots of the assembled pump.

      (5) Line 73: The authors should consider briefly introducing how they prepared the samples for cryo-EM structural studies, as this is a highlight of the manuscript.

      A detailed, multi-step purification protocol has been added as Supplementary Figure S1A to illustrate the sample preparation procedure.

      (6) Lines 77-82: The authors should label these structural features in the corresponding figures for easier reference, particularly clarifying which part refers to the "equatorial domain."

      We have labeled these structural features in the corresponding figures for clarity, and specifically indicated which region corresponds to the equatorial domain.

      (7) Lines 92-93: The first α-helix of TolC is unclear; the authors should indicate the corresponding residues of this helix in the main text. Additionally, it would be beneficial to illustrate the interface in a figure for easier access.

      We have specified the residues corresponding to the participating α-helix of TolC in the main text and illustrated the interaction interface in a figure (Figure 1F) for better visualization.

      (8) Lines 99-100: Did the authors observe additional density for N-palmitoyl and S-diacylglycerol modifications in their cryo-EM density map? If so, they should highlight this in a figure to demonstrate the importance of these modifications.

      The N-palmitoyl and S-diacylglycerol modifications are embedded in the outer membrane but lack a consistent location within it. As a result, they were averaged out during cryo-EM reconstruction and are not visible in our final map.

      (9) Line 122: Please indicate the 33 nm height in the figure.

      The 33 nm height is composed of a 14 nm TolC channel, a 14 nm periplasmic portion of AcrAB, and a 5 nm transmembrane portion of AcrB, which has been added to the right side of Figure 2B.

      (10) Lines 123-124: This sentence feels out of place. It would be more appropriate to move it to another location, such as the beginning of the Results section, to introduce how the samples were prepared.

      This sentence has been moved to the section “Structure of a TolC–YbjP closed-state complex” to describe the sample preparation.

      (11) Lines 127-128: This section needs to be rewritten for improved clarity.

      This sentence has been rewritten as “This tripartite architecture is stabilized by three distinct sets of interfaces: (i) contacts between the AcrB trimer and the basal regions of AcrA, (ii) extensive AcrA–AcrA lateral interactions within the hexameric ring, and (iii) tip-to-tip junctions formed between the upper AcrA α-helical hairpin and the periplasmic entrance of TolC (Figure 2D).”

      (12) Line 141: Please define terms like DN, DC, PN, and PC upon their first use.

      DN and DC (denoting the N- and C-terminal subdomains of the docking domain), PN and PC (named for the N- and C-terminal subdomains of the periplasmic (porter) domain) have been defined where they first appear in the text.

      (13) The lα helix of AcrB is at least partially buried in the membrane (Liu H. et al, PNAS 2025). The authors should consider including this information in their figures, particularly Figure 2B and Figure 5. As the complex is endogenously purified, are there any differences in AcrB compared to those observed in liposomes, SMALP, or vesicles? Did the authors observe significant lipid densities?

      A structural comparison of the AcrB holocomplex with an AcrB structure determined in the native membrane environment (PDB: 9DXN) has been added as Supplementary Figure S8D. In the transmembrane region of AcrB, some sausage-like densities were observed; however, lipid molecules were not modelled in the study.

      (14) The protein purification profile should be included, at least as a supplementary figure.

      The protein purification profile has been added to Supplementary Figure S1A.

      Reviewer #3 (Recommendations for the authors):

      (1) The identification and structural characterization of YbjP as a novel TolC-associated lipoprotein is potentially interesting, and the cryo-EM structures of the TolC-YbjP subcomplex and the complete pump assembly represent a solid starting point. However, the manuscript currently does not sufficiently support the broader mechanistic conclusions implied by the title regarding pump assembly and drug transport. To strengthen the work, the manuscript would benefit from being refocused to highlight the novelty of YbjP, while also providing a clearer mechanistic rationale for its functional role.

      We thank the reviewer for this helpful comment. We have revised the manuscript to better highlight the novel features of YbjP and provide a clearer mechanistic explanation for its function.

      Most Gram-negative TolC homologs, including P. aeruginosa OprM and E. coli CusC, carry native lipid anchors that attach them to the outer membrane. However, E. coli TolC lacks this N-terminal lipidation site. We propose that YbjP, a dually lipidated protein modified with N-palmitoyl and S-diacylglycerol groups, tethers TolC to the outer membrane and functionally replaces the intrinsic lipid anchors found in other outer membrane factors.

      To support this mechanism, we have added Supplementary Figure S3, which compares the anchoring domains of six representative outer membrane components of efflux pumps.

      (2) The structural features and gating dynamics of TolC should be more thoroughly introduced, including prior work describing channel dilation and helix movements (e.g., PMID: 18406332; PMID: 21245342), and the manuscript should discuss how YbjP may influence these known conformational transitions. The relevance of the D396 region should also be considered in the context of previous mutational analyses (e.g., PMID: 32850959).

      All citations mentioned have been added. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC.

      (3) Structural interpretation of the YbjP-containing complexes needs to be strengthened by comparison with the extensive library of available AcrAB-TolC structures in open, closed, and intermediate states (e.g., PMID: 28355133; PMID: 24747401; PMID: 34506732). Such analysis is necessary to determine whether YbjP contributes any distinct allosteric or conformational effects.

      YbjP binds to the equatorial domain of TolC, distant from the tip of its coiled-coil helices. This binding therefore does not interfere with TolC’s functional role, but rather helps anchor TolC within the outer membrane in the correct orientation.

      (4) The speculations regarding the peristaltic nature of AcrB cycling as currently presented in the text and Figure 4 lack novelty and currently reiterate well-established AcrB L/T/O states without offering insight into how YbjP might influence long-range communication within the complex.

      We thank the reviewer for this valuable comment. We agree that the functional rotation mechanism of AcrB with loose, tight and open states has been well documented in previous work.

      In our endogenous intact complex, however, we identified substantial conformational changes in both the porter and transmembrane domains of AcrB that were not observed in earlier isolated structures. To highlight these differences, we have added Supplementary Figure S8 to compare our AcrB structure with all previously reported conformational states.

      On the basis of these structural observations, we have proposed a distinct drug efflux mechanism, which is now described in detail in the revised manuscript.

      (5) Specific clarification is needed regarding the proposed pathway by which YbjP could modulate AcrA or AcrB, given the spatial separation observed in the structures.

      YbjP binds to the equatorial domain of TolC, which has no effect on AcrA or AcrB.

      (6) The manuscript currently lacks functional validation of YbjP, either in vivo or in vitro. Incorporating even basic assays to test YbjP's contribution to efflux function, pump assembly, or antibiotic resistance would significantly enhance the conclusions.

      To explore the potential physiological role of YbjP, we compared the viability of a ΔybjP mutant in the E. coli C600 background with that of the wild-type C600 strain under ciprofloxacin (CIP) stress. However, we did not observe a detectable difference in survival between the two strains under the tested conditions. This result is consistent with the assay reported in the preprint mentioned by the reviewer, although the stress conditions used in that study differ from ours. (See Author response image 1).

      To further address this point, we have added a new Supplementary Figure (Fig. S3) comparing outer membrane proteins with structural and functional similarities to TolC. As shown in this analysis, many such proteins contain an extracellular N-terminal loop that appears to help anchor or stabilize them within the outer membrane. Notably, TolC lacks such a loop, whereas YbjP contains a corresponding loop region, suggesting that YbjP may potentially play a role in stabilizing or positioning TolC in the outer membrane.

      While our current experiments did not reveal a clear phenotype under CIP stress, the structural observations still suggest that YbjP may have a physiological role. We have therefore expanded the Discussion to more carefully consider possible functional implications of YbjP and to explicitly acknowledge the limitations of the present study regarding its physiological characterization.

      (7) The relationship to the prior BioRxiv work by Horne et al. (March 19, 2025) should be discussed more directly, particularly because it reports the same YbjP-TolC association across two different efflux systems and includes higher-resolution structures and functional evidence. The current citation should be revised to accurately acknowledge the precedence and overlap in findings.

      We thank the reviewer for this important suggestion. We have adjusted the citation to earlier in the manuscript to properly acknowledge the work by Horne et al.

      We fully agree that a direct comparison between our structures and those reported by Horne et al. would be highly valuable. However, although nearly a year has passed since the preprint was posted, their atomic coordinates have not been released in the Protein Data Bank. No detailed structural coordinates or models are provided in the preprint itself, which prevents us from performing a meaningful, structure-based comparison with our own data at this stage.

      (8) The references used to support statements on allosteric pump activation (e.g., lines 182-183) should be updated to include more relevant full-complex studies (e.g., PMID: 28355133; PMID: 33009415; PMID: 33909410), and the manuscript should more clearly articulate any proposed mechanism for signal transmission involving YbjP.

      The citations have been added.

      YbjP does not participate in the opening or closing of the TolC channel. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC.

      (9) Overall, while the structural identification of YbjP is noteworthy, additional functional data and more rigorous structural comparison are needed to substantiate the proposed model of pump assembly and drug transport. Reframing the manuscript to emphasize the novelty of YbjP and clarifying its potential mechanistic role would strengthen the work significantly.

      We refer the reviewer to our earlier response for additional functional data. We have added Supplementary Figure S8 to compare our AcrB structure with all previously reported conformational states.

    1. eLife Assessment

      This important study examined age-related changes in cerebellar function by testing a large sample of younger and older adults, including 30 over 80 years old, on motor and cognitive tasks linked to the cerebellum and conducting structural imaging. Their findings show that cerebellar-dependent functions are mostly maintained or even enhanced across the lifespan, with cerebellar-mediated motor abilities remaining intact despite degeneration, in contrast to non-cerebellar measures. Overall, the authors provide compelling evidence in support of preserved cerebellar function with age. These results highlight the resilience and redundancy of cerebellar circuits and offer key insights into aging and motor behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Witte et al. examined whether canonical behavioral functions attributed to the cerebellum decline with age. To test this, they recruited younger, old, and older-old adults in a comprehensive battery of tasks previously identified as cerebellar-dependent in the literature. Remarkably, they found that cerebellar function is largely preserved across the lifespan-and in some cases even enhanced. Structural imaging confirmed that their older adult cohort was representative in terms of both cerebellar gray- and white-matter volume. Overall, this is an important study with strong theoretical implications and compelling evidence supporting the motor reserve hypothesis, demonstrating that cerebellar-dependent measures remain largely intact with aging.

      Strengths:

      (1) Relatively large sample size.

      (2) Most comprehensive behavioral battery to date assessing cerebellar-dependent behavior.

      (3) Structural MRI confirmation of age-related decline in cerebellar gray and white matter, ensuring representativeness of the sample.

      Weaknesses:

      The absence of a voxel-based morphometry (VBM) analysis limits the anatomical and functional specificity of the conclusions. Such an analysis would help identify which functions are truly cerebellar-dependent, rather than relying primarily on inferences drawn from prior neuropsychological literature. Notably, the authors have undertaken this analysis in a separate manuscript.

      As acknowledged in the Discussion, the classification of tasks as "cerebellar-dependent" versus "general" remains somewhat ambiguous. Some measures labeled as "general" may still engage cerebellar processes. Moreover, analyses in the authors' forthcoming manuscript show weak structure-behavior correlations, casting further doubt on how clearly cerebellar-specific functions can be distinguished from more general processes.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are investigating cerebellar-mediated motor behaviors in a large sample of adults, including 30 individuals over the age of 80 (a great strength of this work). They employed a large battery of motor tasks that are tied to cerebellar function, in addition to a cognitive task and motor tasks that are more general. They also evaluated cerebellar structure. Across their behavioral metrics, they found that even with cerebellar degeneration, cerebellar-mediated motor behavior remained intact relative to young adults. However, this was not the case for measures not directly tied to cerebellar function. The authors suggest that these functions are preserved and speak to the resiliency and redundancy of function in the cerebellum. They also speculate that cerebellar circuits may be especially good for preserving function in the face of structural change. The tasks are described very well, and their implementation is also well-done with consideration for rigor in the data collection and processing. The inclusion of Bayesian estimates is also particularly useful, given the theoretically important lack of age differences reported. This work is methodologically rigorous with respect to the behavior, and certainly thought-provoking.

      Strengths:

      The methodological rigor, inclusion of Bayesian statistics, and the larger sample of individuals over the age of 80 in particular are all great strengths of this work. Further, as noted in the text, the fact that all participants completed the full testing battery is of great benefit. Please note, upon my second review the strengths remain. This is a really wonderful investigation and amazingly comprehensive from a behavioral perspective given the numerous tasks and domains that were considered.

      Weaknesses:

      The suggestion of cerebellar reserve, given that at the group level there is a lack of difference for cerebellar specific behavioral component,s could be more robustly tested. That is, the authors suggest that this is a reserve given that volume of cerebellar gray matter is smaller in the two older groups, though behavior is preserved. This implies volume and behavior are seemingly dissociated. However, there is seemingly a great deal of behavioral variability within each group and likewise with respect to cerebellar volume. Is poorer behavior associated with smaller volume? If so, this would suggest still that volume and behavior are linked; but, rather than being age that is critical it is volume. On the flip side, a lack of associations between behavior and volume would be quite compelling with respect to reserve. More generally, as explicated in the recommendations, there are analyses that could be conducted that, in my opinio,n would more robustly support their arguments given the data that they have available.

      The authors have done wonderful work to address the comments from the initial feedback/reviews. While I may ultimately disagree with the approach of including the imaging data in another manuscript, that is at the same time, a reasonable decision. This, however, does not change the impression that the paper would be stronger with the inclusion of the volumetric imaging data. I can understand why it may be published separately - it would be a very long paper to include both. At the same time the assertions made here, which are largely nicely supported by the preprint, would ultimately strengthen this work. The behavior certainly stands on its own as an excellent and needed investigation; together, both pieces make for a truly excellent contribution to the literature.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Witte et al. examined whether canonical behavioral functions attributed to the cerebellum decline with age. To test this, they recruited younger, old, and older-old adults in a comprehensive battery of tasks previously identified as cerebellar-dependent in the literature. Remarkably, they found that cerebellar function is largely preserved across the lifespan-and in some cases even enhanced. Structural imaging confirmed that their older adult cohort was representative in terms of both cerebellar gray- and white-matter volume. Overall, this is an important study with strong theoretical implications and convincing evidence supporting the motor reserve hypothesis, demonstrating that cerebellar-dependent measures remain largely intact with aging.

      Strengths:

      (1) Relatively large sample size.

      (2) Most comprehensive behavioral battery to date assessing cerebellar-dependent behavior.

      (3) Structural MRI confirmation of age-related decline in cerebellar gray and white matter, ensuring representativeness of the sample.

      Weaknesses:

      (1) Although the authors note this was outside the study's scope, the absence of a voxel-based morphometry (VBM) analysis limits anatomical and functional specificity. Such an analysis would clarify which functions are cerebellar-dependent rather than solely inferring this from prior neuropsychological literature.

      (2) As acknowledged in the Discussion, task classification (cerebellar-dependent vs. general measures) remains somewhat ambiguous. Some "general" measures may still rely on cerebellar processes based on the paper's own criteria - for example, tasks in which individuals with cerebellar degeneration show impairments.

      (3) Cerebellar-dependent and general measures may inherently differ in measurement noise, potentially biasing results toward detecting effects in general measures but not in cerebellar-dependent ones.

      We appreciate Reviewer #1's positive assessment of the study, including the acknowledgment of our large sample size, comprehensive behavioral battery, and verification of cerebellar atrophy using MRI. We address the concerns raised as follows:

      (1) Voxel-based morphometry (VBM) and anatomical specificity

      We agree that VBM would strengthen anatomical specificity. As noted in our response to private comments, we have carried out these analyses as part of a separate dedicated study, now available as a preprint (“Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function”, https://doi.org/10.64898/2026.02.13.705695). This work investigates region-level cerebellar aging and its relationship with behavior in detail, including both anatomical and functional parcellations. In short, the preprint demonstrates the absence of structure-function relationship between cerebellar regions (from either anatomical or functional atlases) and cerebellar function. Given the scope of the present manuscript, which focuses primarily on behavioral evidence for cerebellar preservation, we chose not to expand this paper further with VBM results.

      (2) Task classification and cerebellar involvement

      We clarified in the revised manuscript that even “general” measures likely involve cerebellar processing to some extent. We have strengthened the discussion explaining that these measures do not primarily depend on cerebellar function, in contrast to the cerebellar-specific metrics derived from established models (e.g., clock variance in rhythmic tapping). We now explicitly caution against interpreting these general measures as cerebellar-independent.

      (3) Measurement noise and differential sensitivity

      To address the reviewer’s concern that measurement noise may differ between task categories, we now report split-half reliabilities for all measures in the Supplement. These data demonstrate no systematic reliability disadvantage for cerebellar-specific tasks that could explain the pattern of results.

      Reviewer #2 (Public review):

      Summary:

      The authors are investigating cerebellar-mediated motor behaviors in a large sample of adults, including 30 individuals over the age of 80 (a great strength of this work). They employed a large battery of motor tasks that are tied to cerebellar function, in addition to a cognitive task and motor tasks that are more general. They also evaluated cerebellar structure. Across their behavioral metrics, they found that even with cerebellar degeneration, cerebellar-mediated motor behavior remained intact relative to young adults. However, this was not the case for measures not directly tied to cerebellar function. The authors suggest that these functions are preserved and speak to the resiliency and redundancy of function in the cerebellum. They also speculate that cerebellar circuits may be especially good for preserving function in the face of structural change. The tasks are described very well, and their implementation is also well-done with consideration for rigor in the data collection and processing. The inclusion of Bayesian estimates is also particularly useful, given the theoretically important lack of age differences reported. This work is methodologically rigorous with respect to the behavior, and certainly thought-provoking.

      Strengths:

      The methodological rigor, inclusion of Bayesian statistics, and the larger sample of individuals over the age of 80 in particular are all great strengths of this work. Further, as noted in the text, the fact that all participants completed the full testing battery is of great benefit.

      Weaknesses:

      The suggestion of cerebellar reserve, given that at the group level there is a lack of difference for cerebellar-specific behavioral components, could be more robustly tested. That is, the authors suggest that this is a reserve given that the volume of cerebellar gray matter is smaller in the two older groups, though behavior is preserved. This implies volume and behavior are seemingly dissociated. However, there is seemingly a great deal of behavioral variability within each group and likewise with respect to cerebellar volume. Is poorer behavior associated with smaller volume? If so, this would still suggest that volume and behavior are linked, but rather than being age that is critical, it is volume. On the flip side, a lack of associations between behavior and volume would be quite compelling with respect to reserve. More generally, as explicated in the recommendations, there are analyses that could be conducted that, in my opinion, would more robustly support their arguments given the data that they have available. This is a well-executed and thought-provoking investigation, but there is also room for a bit more discussion.

      We appreciate Reviewer’s recognition of the methodological rigor of the study. The public review focuses on the structure-function relationship for the cerebellum. Given that the volume of the cerebellum is smaller in older adults but that the identified cerebellar function are maintained, we conclude that there is no structure-function relationship. We agree with the reviewer that this could be tested further by looking at different parcellations of the cerebellum and demonstrating the absence of association between smaller regions of the cerebellum and the investigated cerebellar function. We agree with the reviewer that this is interesting but believe that this goes beyond the scope of this already extensive paper. For this reason, detailed analyses of the structure-function relationship are available in the preprint version of another paper entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function”, (https://doi.org/10.64898/2026.02.13.705695). In this preprint, across multiple anatomical and functional parcellations, we found no meaningful association between cerebellar structure and cerebellar-specific behavioral measures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Prefacing these suggestions, I want to commend the authors for undertaking this Herculean effort, recruiting such a large sample and administering an extensive battery of tasks. This is an impressively comprehensive study!

      (1) Lesion-symptom mapping. The authors state that lesion-symptom mapping was beyond the scope of the study, but it is unclear why such an analysis could not be performed. Including it would strengthen inferences linking cerebellar structure to behavioral outcomes and help differentiate cerebellar-specific from general performance measures.

      (2) Inter-measure correlations. For cerebellar-dependent tasks, did the authors examine correlations among behavioral measures? If cerebellar aging effects are relatively uniform across the cerebellar cortex, performance across tasks engaging distinct cerebellar regions should, in theory, covary. Similar pairwise correlations for general measures could provide a useful comparison.

      1 + 2: We fully agree with this two points; however, we decided to address this analysis in a separate paper. In the current manuscript, our primary focus was on the behavioral aspects, as these are already quite extensive on their own. In our subsequent work, we conducted an in-depth investigation into the relationship between cerebellar-specific measures and cerebellar structure across distinct cerebellar regions (including anatomical regions and functionally defined regions according to the atlas of Nettekoven et al., 2024). We found that aging does not affect the cerebellum uniformly, but that some anatomical regions exhibit stronger age effects. For the functionally defined regions the age effects were uniformly though. There was no relation between behavioral cerebellar-specific measures and regional gray matter structure.

      In this second paper we also analyzed inter-measure correlations between behavioral cerebellar-specific measures. We did not find any correlations between cerebellar outcomes of different tasks, which indeed could indicate that the different tasks engage distinct cerebellar regions. In addition, we did not find any relation between cerebellar outcomes and anatomically or functionally defined cerebellar regions.

      You can find a preprint of the second manuscript entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function” here: https://doi.org/10.64898/2026.02.13.705695

      (3) Measurement sensitivity. Could differences in age effects reflect varying measurement noise between cerebellar-specific and general measures? For instance, even among younger participants, cerebellar-related measures (e.g., slope in mental rotation) might exhibit greater variability - given that they depend on more conditions, each with its own noise - than general metrics (e.g., baseline motor variability or choice reaction time estimated from a single condition). This could affect sensitivity to detect age-related change and bias results toward finding effects in general rather than cerebellar-specific measures.

      To address this concern, we computed split-half reliability for both cerebellar-specific and general sensorimotor measures and added these estimates to the supplementary materials. As can be seen from Author response table 1, there is no consistent pattern of lower reliability for cerebellar-specific measures that could plausibly account for the absence of age-related effects.

      Author response table 1.

      Split-half reliabilities

      (4) Task dependence on the cerebellum. It is difficult to argue that measures such as reach accuracy, choice reaction time, or rhythm deviation are non-cerebellar. Ataxia certainly impacts reach accuracy. Although patient evidence is mixed - and even when there is a lack of dissociation (e.g., prolonged choice reaction times in both cerebellar and PD groups) - this does not preclude cerebellar involvement in these measures. Indeed, as the authors stated, claims of cerebellar independence should therefore be made cautiously (can be addressed by VBM in comment 1).

      In the paper we tried to emphasize that the general sensorimotor measures still involve cerebellar functions, as this is the case with many movement-related measures. However we theorized that they do not primarily depend on cerebellar function. For example rhythm deviation in the finger tapping task is influenced by cerebellar timing mechanisms as well as motor execution noise, attention, etc. While the cerebellar-specific measure from this task, which is the clock variance, has been shown to extract the contribution of cerebellar-dependent timing mechanisms to this task (Ivry & Keele, 1989).

      On p.37, we added the following paragraph:

      “Similarly, it is important to recognize that general sensorimotor performance is not independent of cerebellar processing. Many broad measures, such as movement accuracy, reaction time, likely reflect contributions from many different brain regions including the cerebellum. As a result, age‑related differences in general sensorimotor performance may emerge from multiple interacting systems rather than cerebellar function alone.”

      (5) Interpreting preserved or enhanced function. The finding of preserved - or even enhanced - performance in older adults is compelling. The authors interpret this as evidence for cerebellar reserve or compensation for cortical decline. An alternative explanation is that cerebellar structures simply decline more slowly than cortical ones, as their gray-matter data suggest; so rather than cerebellar activity revving up, it may remain the same: For example, following up on several of the authors' prior papers, Cisneros et al. (2024) reported enhanced implicit recalibration with age, potentially reflecting greater reliance on cerebellar forward models as sensory (especially proprioceptive) signals degrade. However, this may reflect reweighting rather than compensation - where cerebellar contributions are not enhanced, but rather preserved as other systems decline more rapidly. It would be valuable for the authors to clarify whether they view their findings as evidence of reweighting (slower decline) or compensation (increased contribution).

      We completely agree with this additional interpretation and added a small section to the discussion about it. However, based on the structural cerebellar measures that we have, it is difficult to state whether the reweighting or compensation theory would be more plausible. In either way, both are in line with the cerebellar reserve theory

      Added to discussion (P. 35):

      Importantly, the relative preservation of cerebellar structure compared to other systems may itself contribute to the maintained cerebellar function observed in older age. Even if structural decline is present, the fact that it progresses more slowly than in many cortical and subcortical regions suggests that a form of structural reserve remains available in the cerebellum. This structural reserve could underlie the continued efficiency of cerebellar circuits and support their capacity to sustain motor functions across aging.

      (6) Mental rotation and the continuity hypothesis. The age-related decline in mental rotation performance, if cerebellar-dependent (see McDougle et al., 2022; note minor inconsistency in citation format throughout the paper), supports emerging theories that the cerebellum supports continuous mental simulations in both cognition and action, whether it's forward model simulation or interval-based timing in the motor control domain or mental rotation/intuitive physics in the cognitive domain (Tsay & Ivry, 2025). Given that mental rotation showed the strongest age effect, it would be fascinating to examine whether this correlates with structural loss in Crus I/II, regions most implicated in higher-order cognitive functions - related to Comment 1 above. Even on a crude level, without correlating with behaviour, do the authors have a map for which areas show greater degeneration than others?

      This is also something we did in the other paper mentioned before (Figure 5 of the new preprint). At a first glimpse, the mental rotation outcomes show a strong positive correlation with Crus I and a negative correlation with Crus II, however none of these were significant and the fact that their sign is opposite suggest that these might be random. Indeed, in the preprint, we also compare age-related changes in grey matter volumes for different anatomical and functional cerebellar regions (Figure 1).

      The inconsistencies in citation format have been fixed as well.

      (7) Continuous age analyses. An exploratory analysis correlating age (as a continuous variable) with each dependent measure might provide greater sensitivity than categorical group comparisons, revealing more graded relationships between age and performance.

      Our experiment was not designed to perform such analysis. Testing for group differences provides more power than testing for correlations. For this reason, given that our clearly separated age groups did not show any behavioral differences, we do not expect such an analysis to provide substantial additional insight. Given that the paper is already very extensive, we haven’t performed this additional analysis.

      Congratulations on this comprehensive piece of work!

      Thank you for your kind words

      Reviewer #2 (Recommendations for the authors):

      In the introduction, the authors note that the current literature on the cerebellum in aging has evidence from "studies that relied on single-task paradigms", including a citation to an eye-blink conditioning study. They then note "instead of capturing a broader range of specific cerebellar functions". What do they mean by this? Eye-blink conditioning, for example, when administered in a delay paradigm, is tied directly to the cerebellum and is arguably a cerebellar function or learning paradigm. Some clarity about his point is needed.

      The meaning of this is that most previous studies examining cerebellar function in older adults relied on a single task, or on tasks that were functionally very similar, such as balance and gait, to assess performance. In contrast, our study incorporated multiple tasks targeting different sensorimotor skills, allowing us to identify broader patterns in cerebellar sensorimotor performance in older adults.

      To make this clearer, we have rephrased the sentence (p.4):

      “However, much of the evidence supporting this theory comes from studies that narrowly focused on a single task (Boisgontier & Nougier, 2013; Miller et al., 2013; Woodruff-Pak et al., 2001) or on assessments within similar cerebellar domains such as balance and gait (Droby et al., 2021; Rosano et al., 2007), instead of capturing a broader range of specific cerebellar functions.”

      The authors note that many cerebellar tasks that are impaired in patients are preserved in older adults. The authors, however, seem to ignore delay eyeblink conditioning. Gerwig and colleagues (2010, Behav Brain Res) have shown that this is impacted in patients, and it is also robustly impacted in aging. Older adults still learn, but the age effects are highly replicable. A clear discussion of eye-blink conditioning and how it fits into this framework, and with your findings here, would be really helpful. It seems like a notable oversight not to have it discussed, given the age effects in this context, even if it was not included as a measure.

      Eye blink conditioning is an interesting example that seems to contradict our theory: eye-blink conditioning is both affected by age and dependent on the cerebellum. However, while age-related changes in cerebellar structure evolve continuously with age, changes in eye-blink conditioning performance remains unchanged between 40 and 80 years old. Therefore, eye-blink conditioning suggest that age-related changes in cerebellar structure are not related to possible age-related changes in function. This discussion was already included in the manuscript on p. 36, which reads as:

      “Similarly, no eye-blink conditioning task was included, as it is heavily influenced by cognitive factors such as awareness and arousal, and fear conditioning (LaBar et al., 2004). Previous work has shown that many variables, such as blink reaction time and motor components of the eyeblink reflex, introduce substantial variability in responses at older age (Woodruff-Pak & Jaeger, 1998). In contrast, this study found that only performance on the rhythmic finger-tapping task, similar to what we included in our battery, emerged as a significant predictor of age-related differences in eye-blink conditioning. Furthermore, age-related differences appeared to plateau after early adulthood, with no significant variation in the percentage of correct responses between ages 40 and 80 (Woodruff-Pak & Jaeger, 1998). Practically, the extended duration of the training protocol also makes this task unsuitable for inclusion in a test battery (Winton et al., 2025).”

      This approach also does not consider variability within older adults. That is, on average, they may do better than patients. But, there are also individual differences in cerebellar metrics (structure, for example) within an older adult sample that are a critical consideration here. When looking at the behavioral plots that include the individual data points (which is a great addition and very helpful), it is clear that variability is prevalent. As noted below, it may still be that cerebellar metrics are associated with behavior, given the high degree of variability within the groups across aging.

      We agree with the reviewer that variability is prevalent, as it is in any experiment. In our latest preprint entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function” (https://doi.org/10.64898/2026.02.13.705695), we investigated whether variability in cerebellar structure could predict variability in cerebellar functions. Across all our tasks, we did not find such association, independently of whether we defined cerebellar regions based on an anatomical atlas or a functional one.

      The use of 23 as the cut-off for MOCA scores is rather low. What was the justification for this within the literature? The authors note wanting to ensure task instructions and those with symptoms of potential MCI, but often 26 is used as a minimum score (with 25 and below being potential MCI).

      In the methods, we refer to the study of Carson et al. (2018) that recommends a cutoff score of 23/30 instead of 26/30 as it shows overall better diagnostic accuracy. We selected this cutoff to emphasize that our sample was not restricted to only the highest‑performing older adults. However, we agree that this is not sufficiently explained in the text, so we briefly clarified this (p.5):

      “We assessed cognitive functioning in both older and older‑old participants using the Montreal Cognitive Assessment (MoCA). A minimum score of 23 out of 30 was required for inclusion, following the recommendation by Carson et al. (2018), who demonstrated that this reduced cutoff yields fewer false positives and provides better overall diagnostic accuracy than the original 26/30 threshold. We adopted this criterion to ensure that our sample was not limited to only the highest‑performing older adults.”

      The authors note that the timing of the visits was adapted based on participant availability. It would be helpful to report the mean length of time between sessions, as well as the range.

      We added this to the method section (p.6):

      “There was no fixed interval between the two behavioral sessions. Ideally, both were scheduled within one week, but in practice, the timing was adapted to participants’ availability. Across all participants, this resulted in a mean inter-session interval of 7.40 days (± 9.03; range = 0-63 days). The average interval between the behavioral sessions and the MRI scanning was 6.86 days (± 8.90; range = 0-83 days).”

      The authors have anatomically defined cerebellar parcellations but have looked solely at total volume measures. What is the rationale for this? If there are differential impacts on cerebellar volume with age (Han et al., 2022; Bernard & Seidler, 2013), there may also be positive associations with behavior in regions that are less negatively impacted by volume. This would be consistent with the idea of reserve. One interesting set of correlations that could be considered is with respect to anterior lobules (I-IV and V) relative to the secondary motor representation in VIIIa and VIIIb, such that the latter may show a more robust association with behavior in the positive direction if volume in these regions is less impacted by aging.

      As mentioned in response to one comment from the other reviewer, we investigated this question in our latest preprint (https://doi.org/10.64898/2026.02.13.705695). In this analysis, we did not find any relation between cerebellar outcomes and anatomical or functional cerebellar regions.

      We consider this to be beyond the scope of the present paper, which focuses on the behavioral performances. The total cerebellar volume was added to show that the subject sample we used did actually exhibit atrophy in the cerebellum, but the purpose of the paper was not to focus on the link between structure and function.

      With respect to timing, I recognize that the clock variance is insignificant based on p=.06. However, this is a relatively "close" result. I am very much of the mindset that things are significant or not. Inclusion of Bayesian analyses helps this, but I don't find this particularly convincing. The larger sample of individuals over age 80 is certainly a strength, and I'm not especially concerned about power. But I do wonder about overinterpretation. I would also emphasize the large degree of variability here in the oldest sample. This raises questions about associations with cerebellar metrics. This argument for relative preservation/reserve may be strengthened by looking at individual differences in structure relative to behavior. That is, in areas of the cerebellum where structure is less impacted by aging (as this is not entirely uniform) does this volume predict better behavior in this sample?

      As noted earlier, the relationship between structure and function is examined in our other paper (https://doi.org/10.64898/2026.02.13.705695). Unfortunately, we were unable to include the 80+ group in that analysis because MRI data was available for only 20 older‑old participants and correlations/regression with 20 people are vastly underpowered.

      We also want to point out that the almost significant difference highlighted by the reviewer between age groups actually goes in the direction of the older participants performing better than the young participants.

      The note about the amount of variance in the older-old participants is fair, though.

      The comparison with the Cam-CAN data set seems to be largely qualitative. Why did the authors not make a direct comparison to determine relative similarity in their sample compared to Cam-CAN? This would be a bit more compelling, though I suspect the differences are not statistically reliable (they note the oldest-old in the Leuven sample have a slightly larger volume). I do realize there are sample size differences, but a matched random sub-sample could also be created out of Cam-CAN. Why did they not compute the quadratic model in the Leuven sample as well?

      A quadratic model was not considered very meaningful in the Leuven sample because age was not measured as a continuous variable but categorized into three discrete age groups (which provides more power to look at age-related differences). Our goal was not to determine whether absolute cerebellar volumes matched across datasets, for example, by creating comparable age groups in the Cam‑CAN dataset, but rather to assess whether the pattern of age‑related effects in our sample aligned with those seen in a larger dataset. In our opinion, the current approach sufficiently demonstrates that the age‑related trends we observe are consistent with those reported in Cam‑CAN.

      The analysis of relative cerebellar gray and white matter is quite interesting. However, what about regional patterns to this? It would be particularly interesting to know if some regions are more or less impacted or preserved relative to the cortex. The data are seemingly available based on the processing approach (at least for gray matter). Was a similar analysis also computed in Cam-CAN? Replicating this in an independent sample would also be of interest.

      We agree with the reviewer that this is indeed interesting for further analyses on this dataset. However, it falls beyond the scope of the present paper. Our preprint (https://doi.org/10.64898/2026.02.13.705695) looks at regional patterns for the cerebellum. Other papers have compared age-related decline in different cortical and subcortical regions as discussed on p.35 of our discussion:

      “Given that the cerebellum exhibited a relatively less pronounced structural decline compared to other brain regions as shown here and in another previous study (Taki et al., 2011), it seems more plausible that the cerebellum might compensate for deficits caused by structural changes in other areas rather than vice-versa. Age-related gray and white matter degeneration is usually faster in frontotemporal regions and subcortical regions, including the hippocampus, amygdala and thalamus than in the cerebellum (Fjell et al., 2013; Giorgio et al., 2010; Neufeld et al., 2022). Although this does not directly indicate functional implications, it suggests that cortical regions are less likely to compensate for cerebellar loss when they exhibit more severe degeneration.”

      The authors argue for cerebellar reserve and present compelling behavioral data in support of this with their many tasks. In instances where they look at largely cerebellar-mediated measures, they demonstrate that older adults and the >80 year old group show relatively intact behavior, even those in the group for total cerebellar gray matter volume (and white matter) is significantly smaller than in young adults. As noted, the behavioral data are very compelling, and as an individual who looks at aging populations in their research, seeing areas and domains of preservation is always interesting and useful. This pattern certainly may be consistent with cerebellar reserve. However, it would be more compelling if the authors also looked at these behaviors with respect to cerebellar volume. That is, there is still a great deal of variability in behavior in the older and >80 samples (though also in the young adults) that may still be associated with cerebellar volume. Poorer performance may be present in those with smaller volumes. This would also be somewhat consistent with the notion that these tasks are those that are derived from work in cerebellar degeneration samples. Associations between behavior and cerebellar measures would speak to this. If there are no associations with volume, this would be particularly interesting and compelling in the context of reserve. Alternatively, if there are differential impacts on cerebellar volume with age (Han et al., 2022; Bernard & Seidler, 2013), there may also be positive associations with behavior in regions that are less negatively impacted by volume. This would be consistent with the idea of reserve. One interesting set of correlations that could be considered is with respect to anterior lobules (I-IV and V) relative to the secondary motor representation in VIIIa and VIIIb, such that the latter may show a more robust association with behavior in the positive direction if volume in these regions is less impacted by aging. Not all individuals completed the scan (due to safety and comfort considerations), which would limit statistical power potentially, but this could be conducted in the subset of individuals that have both sets of data.

      This point overlaps with the issues raised by the other reviewer in comments 1 and 2, which highlights the importance of this point. Yet, we decided to address this analysis in a separate paper. In the current manuscript, our primary focus was on the behavioral aspects, as these are already quite extensive on their own. In our subsequent work (https://doi.org/10.64898/2026.02.13.705695), we conducted an in-depth investigation into the relationship between cerebellar-specific measures and cerebellar structure across distinct cerebellar regions (including anatomical regions and functionally defined regions according to the atlas of Nettekoven et al., 2024). We found that aging does not affect the cerebellum uniformly, but that some anatomical regions exhibit stronger age effects. For the functionally defined regions the age effects were uniform though. There was no relation between behavioral cerebellar-specific measures and anatomical or functional cerebellar regions.

      Some of the assertions the authors make in the discussion about the cerebellum have less pronounced structural decline relative to other brain regions would benefit from being tempered. They used relative measures here, and this is certainly interesting. But, how do other regions stack up? What would the hippocampus look like if such a measure were used? And as noted, does this pattern replicate in the CAM-CAN sample? Further, the authors cite Jernigan et al. (2001) in arguing that cerebellar changes are smaller than those in other brain regions, when in looking at their tables, in fact, the gray matter reductions of the cerebellum are comparable to those of the prefrontal cortex and second only to those of the hippocampus.

      We agree with the reviewer that this is an interesting question but this question needs to be addressed in a separate paper. We also remove the citation to the Jernigan paper.

    1. eLife Assessment

      This valuable article provides a convincing and very detailed model of the process regulating the assembly of the spore coat in the model spore-forming bacterium Bacillus subtilis. It focuses on SafA, a morphogenetic coat protein involved in the assembly of the spore coat inner layer, deciphering the contributions of disulfide bond formation and crosslinking reactions catalyzed by a transglutaminase. The process had been studied with a combination of genetics and microscopy, but this is the first complete assessment incorporating detailed biochemical approaches.

    2. Reviewer #1 (Public review):

      This is an important article, which represents the culmination of 25 years of research on the spore coat protein, SafA. Reading this paper is not necessarily easy because it requires time, patience, and attention to detail, but it is truly rewarding. The attentive reader will certainly appreciate the description of a biochemical tour de force, providing convincing experimental evidence for every aspect of a step-by-step inner coat assembly model. It was previously known that SafA was a coat morphogenetic protein responsible for the assembly of the inner layer of the spore coat in Bacillus subtilis, and SafA was already viewed as a hub that directly or indirectly recruited several dozens of coat proteins to the spore envelope. It was also known that there were isoforms of SafA (the most important being the C30 form), and SafA was a substrate of Tgl, a transglutaminase involved in crosslinking some of the coat proteins, especially those found in the inner coat. Several studies have combined genetics and various types of microscopy approaches, including fluorescence microscopy, to decipher the mechanism of coat assembly, but the current study brings top-notch biochemistry into the picture and, therefore, is able to go much further into the molecular characterization of this important mechanism. It should be noted that spore coat assembly is a notoriously difficult process to study biochemically. It was also suspected to be a complex mechanism, because coat assembly is a protracted process involving at least 80 different proteins, whose production is controlled both temporally and spatially, but the current paper manages to connect specific chemical reactions to well-known stages of spore formation. The authors did so by generating several constructs with specific substitutions of Cys and Lys residues, interfering with the completion of disulfide bond formation and crosslinking events, thus determining the order of events and the structural consequences when one of these steps is impaired. Importantly, their conclusions are consistent with previous work. In the updated model, self-assembly of SafA is the first step, promoted by disulfide bond formation between C30 complexes. This is followed by recruitment of inner coat proteins and, finally, transglutamination to stabilize the scaffold structure (referred to as a "spotwelding activity".

      The work is extremely thorough. I did not identify any weaknesses and could not think of any experiment that would have been omitted.

    3. Reviewer #2 (Public review):

      Summary:

      The authors assemble a variety of information from biochemical experiments on oligomeric and higher-order assembly of the spore coat protein SafA, which functions as a hub in spore coat development. Together, the data indicate a robust process of assembly, guided initially by an organized process of disulfide bond formation and ultimately leading to cross-linking by the enzyme Tgl. Interestingly, neither process is strictly necessary for the formation of highly assembled oligomeric forms of SafA, but instead, these processes are mutually supportive in creating a strong, intercrosslinked assembly. Given this lead-up, it is somewhat disappointing to find that the cross-linking defective SafA mutants do not exhibit any obvious defects in sporulation in vivo, and one is left with the conclusion that this stage of spore coat assembly is accomplished by multiple independent co-occurring activities. The information is sufficient to support a detailed model for SafA assembly, which is significant in that it helps to explain the process of building a critically important hub-scaffold for spore coat development.

      Strengths:

      The main body of experiments supports a detailed model for the assembly of SafA monomers into spore coat superstructures. This is interesting because it shows how a protein can be used as both a scaffold and a hub in contributing to the assembly of a super-resilient biological material.

      Weaknesses:

      (1) The weak sporulation phenotype of the crosslinking mutants diminishes the significance of the mechanism that is described.

      (2) The narrative flow of the originally submitted manuscript could be improved by removing some unnecessary and confusing figures on peripheral subjects and rearranging some of the latter figures to arrive at a conclusion that focuses more on SafA assembly.

      (3) The original manuscript appears to have a labeling error in the supplementary figures, but a correctly labeled version of the figures would not support one of the manuscript's claims.

    4. Reviewer #3 (Public review):

      The manuscript by Amara et al. provides novel mechanistic insight into how SafA, a spore coat morphogenetic protein, self-assembles and is later crosslinked by the Tgl transglutaminase during spore coat assembly. Through rigorous, carefully executed biochemical analyses of SafA's oligomerization and crosslinking states, the authors demonstrate that SafA forms dimers that promote disulfide bond formation between two cysteine pairs found in its C30 region; this disulfide bond-mediated crosslinking promotes, but is not essential for, Tgl-mediated crosslinking of lysine residues within SafA. Specifically, one pair in its N-terminal C30 region promotes the formation of higher-order oligomers, while the second pair in its C-terminus C30 region promotes its ability to form a tetramer. Mutation of both cysteine pairs prevents higher-order SafA structures and reduces the efficiency of Tgl-mediated crosslinking via lysines in close proximity to the cysteines. They further show that disulfide bond formation promotes, but is not essential for, SafA to self-assemble into structures ~1200 kDa via SAXS analyses and kinetic analyses of Tgl-mediated crosslinking of purified SafA in vitro.

      Major Comments:

      (1) While the authors' detailed and thorough biochemical analyses advance our understanding of how SafA forms higher-order structures in the presence and absence of Tgl, they could broaden the significance of their findings with additional functional analyses of their mutants in B. subtilis. Figure 8 shows that loss of Tgl and SafA disulfide bond formation renders SafA more extractable (presumably leading to a less resilient spore coat), and FRAP analyses indicate that SafA in ∆tgl sporulating cells is more mobile than in its lysine crosslinked form. Some ideas that the authors could test to try and identify additional functions for the Cys and Lys residues in SafA:<br /> - Analyze the Cys mutants in the FRAP assay?<br /> - Does loss of SafA-mediated crosslinking via the Cys and/or Lys mutations affect its localization to the forespore or the recruitment of its client proteins like GerQ?<br /> - Have the authors tested higher concentrations of lysozyme? Or chloroform?

      (2) While the authors show in supplementary data that the safA point mutants they generated do not affect spore germination in the single condition tested, the Rudner group previously showed that SafA plays a role in spore germination by affecting CwlJ localization to the forespore. Perhaps the authors might see a more significant phenotype on spore germination with their Cys and Lys mutants if they tried to complement a ∆safA∆sleB double mutant with mutant safA constructs? For the germination assays, it was unclear to me whether the authors used heat activation prior to inducing spore germination.

      (3) Have the authors looked at whether the Cys or Lys mutations affect the sensitivity of spores to oxidative insults, especially since the Cys residues might temper the effects of oxidizing agents?

      (4) Did the authors test the effect of single Cys mutations on disulfide bond formation, since intermolecular disulfide bond formation might still be possible even if one of the Cys residues has been changed?

      (5) Finally, I was unsure how many times each experiment was replicated and how many experiments had been conducted in total.

    1. eLife Assessment

      This valuable study aims to determine mechanisms underlying breast cancer initiation and tumour progression. The manuscript includes a solid set of transcriptomic and proteomic datasets from tumour samples and examines mitochondrial function within the tumours. While the underlying mechanisms linking expression changes to functional effects remain speculative. This paper provides a resource for researchers working on breast cancer and/or HER2-driven bioenergetics changes.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Frangos at al. used a transcriptomic and proteomic approach to characterise changes in HER2-driven mammary tumours compared to healthy mammary tissue in mice. They observed that mitochondrial genes, including OXPHOS regulators, were among the most down-regulated genes and proteins in their datasets. Surprisingly, these were associated with higher mitochondrial respiration, in response to a variety of carbon sources. In addition, there seems to be a reduction in mitochondrial fusion and an increase in fission in tumour tissues compared to healthy tissues.

      Strengths:

      The data are clearly presented and described.

      The author reported very similar trends in proteomic and transcriptomic data. Such approaches are essential to have a better understanding of the changes in cancer cell metabolism associated with tumorigenesis.

      The authors provided a direct link between HER2 inhibition and OXPHOS, strengthening the mechanistic aspect of the work.

      Weaknesses:

      The manuscript would have benefited from more ex-vivo approaches to further dissect mechanistic links and resolve the contradiction of elevated respiration with reduced expression of most associated proteins (but these points are clearly articulated in the discussion).

      The results presented support the authors' conclusions, and limitations are addressed in the discussion. This work will likely impact the progression of the field, and the provided data will benefit the scientific community.

      Comments on revisions:

      The authors addressed all my concerns.

    3. Reviewer #2 (Public review):

      Frangos et al present a set of studies aiming to determine mechanisms underlying initiation and tumour progression. Overall, this work provides some useful datasets, further establishing mitochondrial dysfunction during the cellular transformation process.

      A key strength is the coordinated analysis of transcriptomics and proteomics from tumour samples derived from a Neu-dependent mouse model for breast cancer. This analysis provides rigorous datasets that show robust patterns, including down-regulation across many components of mitochondrial OXPHOS that were generally consistent at both the mRNA and protein level. Parallel analysis of corresponding tumour samples thereby clearly shows the opposite trend of increased mitochondrial function, which is unexpected. As such, this work further establishes altered mitochondrial phenotypes in tumour contexts and further illustrates that mitochondrial function is not necessarily always tightly correlated with mitochondrial gene expression patterns.

      Several key weaknesses remain. It remains unclear how increased mitochondrial function is being sustained despite wide decreases in mRNA and protein levels of OXPHOS components. In terms of mechanism, the study confirmed that pharmacologic EGFR inhibition decreases OXPHOS in a EGFR-dependent breast cancer line. However, it remains unclear if the cell culture system recapitulates other key observations of the tumour model (namely decreased expression with increased function).

      Therefore, the mechanistic basis of increased mitochondrial function in light of decreased mitochondrial content remains speculative, as does the role of these changes for tumour initiation or progression.

      Comments on revisions:

      We agree with the overall findings of the study and appreciate that the claims in text and title have been appropriately toned down.

      As additional suggestions eg for presentation, many of the graphics/labels are still too small to be useful. It would be interesting to see if this cell line is similar to the tumours in terms of all the phenotypes. The lapatinib experiment was good. I wonder how quick this drug affects the mitochondria. Also it would be interesting to see if these cells have higher OXPHOS than other non-transformed breast epithelial cells.

      The WB on oxphos components is good with ab110413 but this looks like many subunits are detected so this should be made clear.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Comments on revisions: The authors addressed all my concerns.

      We thank you for the positive review and feedback throughout the review process.

      Reviewer #2 (Public review):

      Comments on revisions: We agree with the overall findings of the study and appreciate that the claims in text and title have been appropriately toned down. As additional suggestions e.g. for presentation, many of the graphics/labels are still too small to be useful. It would be interesting to see if this cell line is similar to the tumours in terms of all the phenotypes. The lapatinib experiment was good. I wonder how quick this drug affects the mitochondria. Also it would be interesting to see if these cells have higher OXPHOS than other non-transformed breast epithelial cells. The WB on oxphos components is good with ab110413 but this looks like many subunits are detected so this should be made clear.

      Thank you for these suggestions.

      We have clarified in the Methods section (lines 475–476) the specific OXPHOS subunits detected using the Ab110413 antibody cocktail.

      With respect to lapatinib, prior work has shown that lapatinib can alter the phosphoproteome within minutes to hours (PMID:22964224). In our experiments, however, NF639 cells were exposed to lapatinib for 24 hours - a timeframe in which transcriptional and translational remodeling are also expected to occur. Therefore, we cannot distinguish whether the observed suppression of OXPHOS reflects acute signaling effects or downstream changes in gene and protein abundance. Importantly, the purpose of this experiment was proof-of-principle: to determine whether HER2 signaling contributes to respiratory competency in a cell line derived from the same transgenic model as the intact tumor slices used in this study. Thus, while defining the precise kinetics of inhibition or comparing to benign/non-transformed cells would be interesting, these were not the primary objectives of the added experiments.

      We have increased figure label sizes across all main figures.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Frangos et al. used a transcriptomic and proteomic approach to characterise changes in HER2-driven mammary tumours compared to healthy mammary tissue in mice. They observed that mitochondrial genes, including OXPHOS regulators, were among the most down-regulated genes and proteins in their datasets. Surprisingly, these were associated with higher mitochondrial respiration, in response to a variety of carbon sources. In addition, there seems to be a reduction in mitochondrial fusion and an increase in fission in tumours compared to healthy tissues.

      Strengths:

      The data are clearly presented and described.

      The author reported very similar trends in proteomic and transcriptomic data. Such approaches are essential to have a better understanding of the changes in cancer cell metabolism associated with tumourigenesis.

      Weaknesses:

      (1) This study, despite being a useful resource (assuming all the data will be publicly available and not only upon request) is mainly descriptive and correlative and lacks mechanistic links.

      We appreciate this point. While the primary goal of our study was to assess mitochondrial adaptations with HER2-driven tumorigenesis, we agree strengthening the mechanistic interpretation would improve the impact of the data. To address this, we have provided experiments demonstrating HER2 inhibition in NF639 cells with lapatinib supresses respiratory capacity, directly supporting the interpretation that HER2 activity regulates respiratory function (Figure 10). We have expanded the discussion appropriately (lines 378-394). Both raw RNA-seq and proteomic data were deposited through GEO and the PRIDE repositories (accession numbers included in Data Availability Statement).

      (2) It would be important to determine the cellular composition of the tumour and healthy tissue used. Do the changes described here apply to cancer cells only or do other cell types contribute to this?

      We thank the reviewer for this suggestion; we have added experiments that have directly addressed this concern.

      Cell type composition analysis by immunofluorescence was added (Figure 6) where we quantified epithelial, mesenchymal, endothelial, immune and stromal populations in our benign mammary tissue and tumor samples. We found no major shift in the dominant cell types that would confound transcriptomic data in whole tissues.

      We integrated immunofluorescence data with a publicly available scRNA-seq dataset from human breast tumors which allowed us to estimate cell-type-specific expression of OXPHOS genes in our own samples. Despite the possibility of species differences, this is the only dataset of its kind, and we used this to generate an estimate of cell type weighted OXPHOS mRNA expression (Figure 6). This revealed that epithelial cells are likely the dominant contributors to OXPHOS gene expression for CIIV. All calculations are delineated in the Methods section.

      (3) Are the changes in metabolic gene expression a consequence of HER2 signalling activation? Ex-vivo experiments could be performed to perturb this pathway and determine cause-effects.

      Thank you for this suggestion – we have included an experiment directly testing this concept. We assessed mitochondrial respiration in NF639 HER2-driven mammary tumor epithelial cells in the presence or absence of the well-described dual tyrosine kinase inhibitor lapatinib. Lapatinib reduced basal, CI-linked and CI+II linked respiration without compromising mitochondrial integrity or coupling, demonstrating that HER2 activation regulates respiration in our model. This data is presented in Figure 10, and a new section has been added to the discussion describing the implications of this finding in the context of the current literature (lines 378-394).

      (4) The data of fission/fusion seem quite preliminary and the gene/protein expression changes are not so clear cut to be a convincing explanation that this is the main reason for the increased mitochondria respiration in tumours.

      We agree mitochondrial morphology and dynamics alone cannot fully account for the observed respiratory phenotype – this was emphasized in the discussion but has since been further clarified (lines 365-377). We retained the TEM and dynamics gene/protein data because they do support morphological differences consistent with enhanced fission. However, we have revised the tone of our interpretation to more explicitly acknowledge that these findings are correlative, and the updated discussion now emphasizes that the increased respiratory capacity in tumors is likely driven by multiple converging mechanisms.

      Reviewer #2 (Public review):

      Frangos et al present a set of studies aiming to determine mechanisms underlying initiation and tumour progression. Overall, this work provides some useful insights into the involvement of mitochondrial dysfunction during the cellular transformation process. This body of work could be improved in several possible directions to establish more mechanistic connections.

      (5) The interesting point of the paper: the contrast between suppressed ETC components and activated OXPHOS function is perplexing and should be resolved. It is still unclear if activated mitochondrial function triggers gene down-regulation vs compensatory functional changes (as the title suggests). Have the authors considered reversing the HER2-derived signals e.g. with PI3K-AKT-MTOR or ERK inhibitors to potentially separate the expression vs. functional phenotypes? The root of the OXPHOS component down-regulation should also be traced further, e.g. by probing into levels of core mitochondrial biogenesis factors. Are transcript levels of factors encoded by mtDNA also decreased?

      We appreciate this insight and agree that the discordance between mitochondrial content and function is fascinating and have addressed the concerns above in the following manner:

      - We have altered the title – we agree we cannot definitively say that the enhanced respiratory capacity observed is compensatory.

      - We have added experiments in NF639 cells in the presence of lapatinib, a tyrosine kinase inhibitor to interrogate whether HER2 is necessary for our functional outcome of interest – the enhanced respiratory capacity in the tumors. Lapatinib significantly suppressed respiration (Figure 10) demonstrating HER2 signaling directly regulates mitochondrial respiration.

      - We have expanded the discussion to provide further comment on potential explanations for increased respiratory function and low mitochondrial content.

      (6) The second interesting aspect of this study is the implication of mitochondrial activation in tumours, despite the downregulation of expression signatures, suggestive of a positive role for mitochondria in this tumour model. To address if this is correlative or causal, have the authors considered testing an OXPHOS inhibitor for suppression of tumorigenesis?

      Previous studies have eloquently highlighted that directly or indirectly inhibiting mitochondria can supress growth in HER2-driven breast cancer (PMID:31690671) or alternatively, amplification of mt-HER2 enhances tumorigenesis (PMID: 38291340). In many solid tumors, this is the concept of preclinical and clinical studies using IACS-010759 or similar inhibitors of OXPHOS which do suppress growth but have significant off target effects in healthy tissues (PMID: 36658425, 3580228We have expanded the discussion to ensure the reader is aware of these previous contributions and highlighted the importance of future work delineating the role of enhanced respiratory function in HER2-driven mammary cancer (lines 378-394).

      (7) A number of issues concerning animal/ tumour variability and further pathway dissection could be explored with in vitro approaches. Have the authors considered deriving tumourderived cell cultures, which could enable further confirmations, mechanistic drug studies and additional imaging approaches? Culture systems would allow alternative assessment of mitochondrial function such as Seahorse or flow cytometry (mitochondrial potential and ROS levels).

      We thank the reviewer for this suggestion – we have addressed this in part by using the NF639 HER2driven tumor epithelial line which demonstrated that HER2 regulates our observed respiratory response. Unfortunately, the addition of tumor derived cell cultures was not feasible or within the scope of our study. Animal and tumor variability has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (8) The study could be greatly improved with further confirmatory studies, eg immunoblotting for mitochondrial components with parallel blots for phospho-signalling in the same samples. It would be interesting if trends could be maintained in tumour-derived cell cultures. It is notable that OXPHOS protein/transcript changes are more consistent (Figure 5, Supplementary Figure 4) than mitochondrial dynamics /mitophagy factors (Figure 8). Core regulatory factors in these pathways should be confirmed by conventional immunoblotting.

      We thank the reviewer for this thoughtful comment. While we agree that additional confirmatory studies can be valuable, due to tissue quantity constraints and the number of assays required for our multi-omics analysis, extensive additional blots were not feasible. However, we had sufficient protein to provide select OXPHOS proteins to verify the proteomic data (now provided in S-Fig.4H). Furthermore, we have plotted the fold change of genes and proteins detected in both datasets and added this to Figure 4 (4A, B), further highlighting the consistency between our transcriptomic and proteomic findings. We believe that the highly consistent and concordant nature of our datasets collectively provides strong support for our central objective - determining whether mitochondrial content and respiratory function correlate in HER2-driven mammary tumors. The reproducibility of OXPHOS-related changes reinforces the robustness of our observations. We also appreciate the reviewer’s insight that OXPHOS alterations appear particularly consistent. In response, we have edited the discussion to further emphasize this point, especially in relation to the distinctive pattern observed for Complex V, which showed greater preservation relative to Complexes I–IV across several methods (lines 348-364). We comment on how this stoichiometric shift may contribute to intrinsic respiratory activation despite reduced mitochondrial content.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Further Minor points.

      (9) It would be helpful to know further details regarding the source of the tumour samples, particularly for the proteomics (N=5) and transcriptomics (N=6) datasets, since the exact timepoint of tissue harvest and number of tumours/mouse varied, according to the methods section. Were all samples from the omics studies from different mice (ie 11 mice)? B4 and B6 seem like outliers in mitochondrial transcriptomes. Are these directly paired eg with T4 and T6? Are the side-by-side pairs of Ben and Tum samples for blots in Figure 1 and Supplementary Figure 1 from the same mouse.

      This has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (10) Further references and details are needed to support the methodology of the mitochondrial function tests (eg. nutrients vs pairing with complexes). What was the time point of nutrient supplementation? It would seem that the lipid substrates should take longer to activate OXPHOS than pyruvate/malate or succinate. Is this the case? Is there speculation as to why succinate supplementation is much more active than pyruvate+malate? What is +MD in Figure 6? The rationale for pooling data for Figure 7A is unclear since the categories appear to overlap: (pyruvate, malate, ADP) vs. (palmitoyl-carnitine, malate, ADP).

      Thank you for this comment. We have expanded the methods (lines 515-531) to provide additional detail on the mitochondrial respiration protocol. Briefly, permeabilized tissues were exposed to substrates delivered at supraphysiological concentrations in a sequential protocol lasting ~30–60 minutes. Under these conditions, mitochondrial respiration reflects the maximal capacity to utilize each substrate rather than the physiological time course of substrate mobilization or uptake that would occur in vivo with the influence of blood flow and transport/substrate availability limitations.

      (11) Many of the figures were blurry (Figure 1F, 2B) or had labels that were too small to be effective (Figures 1G, H, 2D-G, 3E-G, 5E-I, 7C, 8B).

      The font size of figure labels has been increased where possible and all figures have been exported to maximize resolution.

    1. eLife Assessment

      This important study addresses the contribution of pericytes to the organization and permeability control of the zebrafish blood-brain barrier (BBB). By analyzing pdgfrb mutant zebrafish that lack brain pericytes, the authors reveal that the resulting cerebrovascular network is abnormally patterned. Remarkably, however, the barrier retains its restrictive permeability during larval and juvenile stages. More pronounced vascular defects become evident in adults, where localized BBB leakage coincides with hemorrhages and aneurysm formation. Based on convincing and beautifully documented imaging data, the authors argue that, unlike what has been reported in rodent systems, pdgfrb-dependent pericytes are not essential for maintaining BBB integrity in the zebrafish brain.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, the show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages-from larval to adult zebrafish-the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

    3. Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092)".

      The Reviewing Editor has carefully reviewed the revised manuscript and is fully satisfied with the authors' revisions.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, they show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However, mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption, and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages - from larval to adult zebrafish - the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

      Weaknesses:

      The study uses quantitative tracer assays with multiple molecular weight dyes to evaluate blood-brain barrier (BBB) permeability. The study normalizes the intensity of tracer signals (e.g., 10 kDa, 70 kDa dextrans) in the brain parenchyma to the vascular signal of a 2000 kDa dextran tracer (assumed to remain within vessels). Intensity normalization is used to control for variations in tracer injection efficiency or vascular density. This method doesn't directly assess the absolute amount of tracer present in the parenchyma, potentially underestimating leakage severity. As the lack of BBB impairment is a "negative" finding, more rigorous controls or other methods might be needed to corroborate it.

      In response to these and comments from other reviewers, we have now performed further carefully controlled analysis to test leakage of tracers using molecular weights ranging from 1 to 2000 kDa. We have performed additional normalisation approaches (new data in Fig. 2a–d) imaging tracer extravasation together with vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this transgenic reporter for normalisation (as suggested by Reviewer #2). The results of these experiments all supported our initial conclusions (revised Extended Data Fig. 3a–d) further validating the reliability of our method. Furthermore, as suggested by the reviewer analysis of the raw tracer intensity amounts in the parenchyma were also performed with no normalization at all (see Author response image 1). This also supports our conclusion that the BBB is intact in young animals. Finally, we now use our methods to demonstrate that we can detect an immature leaky BBB at 3 dpf and a mature functional BBB at 7 dpf (Fig. 2e-f), a suitable positive control to show that our methods and analyses are reliable.

      Author response image 1.

      Raw intensity values from the parenchyma confirm findings in Figure 2 and Extended Data Figure 3.a–d, Raw mean fluorescence intensity values of extravasated tracers in the midbrain.(a–b) show unnormalized values corresponding to Extended Data Fig. 3a–d, and (c–d) show unnormalized values corresponding to Fig. 1a–d. Unpaired t-tests for 70 and 10 kDa at 14 dpf in (a–b), for 10 kD at 7 dpf, and for 70 kDa at 14 dpf in (c–d). Mann-Whitney tests for 70 and 10 kDa at 7 dpf in (a–b), for 70 kDa at 7 dpf, and for 10 kDa at 14 dpf (c–d), due to non-normal distribution. These data were all generated in genotype blind assays, display variance in signal that is generated between embryos due to injection differences and show no difference between the genotypes analysed in BBB integrity. Comparison of this to normalised data using 2000 kDa tracer or kdrl expression in endothelial cells (Fig. 2 and Extended Data Fig. 3) confirms that normalisation improves the analysis, effectively controlling for embryo-to-embryo differences in delivery of tracer and imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092). The study by Ando et al, 2021 did not report experiments assessing BBB leakage in pdgfrb mutants but in the review article by Ando et al (PMID: 34685412) it is stated that "indicating that endothelial cells can produce basic barrier integrity without pericytes in zebrafish."

      We thank the reviewer for their comments and pointing out literature that we had not cited (this has been corrected in our revised manuscript).

      As noted by other reviewers, our study goes beyond simply confirming previous literature. The quoted section by the reviewer from Ando et al 2021 regarding intact barrier integrity in pdgfrb mutants is a conclusion based on apparent lack of haemorrhages in pdgfrb mutants[1]. Our work shows haemorrhages in older animals and as such is in line with these previously published results, but it also extends previous work, for the first time reporting detailed functional analysis to assess BBB integrity. Our study uses definitive tracer assays (now including extensive revisions) to identify intact the BBB in pdgfrb mutants in live animals. This has not been previously described and is important because it offers a new perspective on the evolutionary conservation (or otherwise) of pericyte control of BBB function. Furthermore, our study investigates the nature of hotspot leakage and haemorrhages in more detail than in previous work.

      Weaknesses:

      (1) The authors should avoid using violin plots, which show distribution. Instead, they should replace all violin plots in the figures with graphs showing individual data points and standard deviation. For Figure 2f specifically, the standard deviation in the analyzed cohort should be shown.

      This is a good point and we have replaced the violin plots with individual data points and shown all data as mean±SEM.

      (2) The authors have not shown the reduced PDGFRB protein or the effect of mutation on mRNA level in their zebrafish mutant.

      Our pdgfrb<sup>uq30bh</sup> mutant allele introduces a mutation predicted to generate a truncated protein very similar to previously validated alleles (see detail in revised Extended Data Fig. 1a and methods). Our pdgfrb<sup>uq30bh</sup> mutant also phenocopies previous pdgfrb mutants (sa16389 and um148 alleles)[2,3], displaying mural cell loss with multiple markers (Fig. 1a, new data in Extended Data Fig. 1b–c, Fig. 3b–c; Extended Data Fig. 4c–d) and the same typical morphological defects and survival rates (new data in Extended Data Fig. 1d–f). Thus our mutant phenocopy gives confidence it is most likely a null allele, in line with previous papers studying presumed null alleles[1].

      We believe this provides sufficient confidence in this allele of pdgfrb. Moreover, considering that our manuscript focusses on loss of mural cells and we show definitively that this mutant has robust loss of mural cells in the brain, our mutant is suitable for this study.

      (3) Statistical data analysis: Did the authors perform analyses to investigate whether the data has a normal distribution (e.g., Figures 1d, e)?

      We thank the reviewer for raising this and apologise for this oversight. All data have now been assessed for normality using Shapiro-Wilk test and further statistical analyses have been performed accordingly. The specific quantifications referred to by the reviewer in Extended Data Fig. 3a–d (previously Fig. 1d-e), have normal distribution except for quantification measuring 70 kDa extravasation at 7 dpf, therefore Mann-Whitney test has been used for this comparison. Further information can be found in figure legends and methods.

      (4) Analysis of tracer extravasation. The use of 2000 kDa dextran intensity as an internal reference is problematic because the authors have not provided data demonstrating that the 2000 kDa dextran signal remains consistent across the entire vasculature. The authors have not provided data demonstrating that the 2000 kDa dextran signal in vessels exhibits acceptable variance across the vasculature to serve as a reliable internal reference. The variability of this signal within a single animal remains unknown. The presented data do not address this aspect.

      We thank the reviewer for their comment and agree that analysis was needed for showing 2000 kDa dextran as a reliable normalization signal.

      We now show the data in the following Figures that demonstrate the consistency of signal throughout the vasculature using this 2000-kDa tracer: Extended Data Fig. 2b, Extended Data Fig. 3a and c, Extended Data Fig. 5a, Extended Data Fig. 6. In fact, we observe that this 2000 kDa tracer provides a very reliable marker of large and small calibre vessels in larval, juvenile and adult animals, even in fixed and cleared whole tissues and animals (e.g. Extended Data Fig. 2d-e, Extended Data Fig. 5 and 6).

      Our further experiments and analysis support the use of this tracer as an ideal way to normalise for variation between animals and coupled with improved masking of vessels using transgenic labels (e.g. Extended Data Fig. 2b) we can quantify across whole vascular networks to reduce the concern about variation within individual animals. We also find 2000 kDa shows negligible leakage through the brain vessels Extended Data Fig. 2b–c (new data) at 2 hours post-injection (hpi) and provided images in Extended Data Fig. 6b–b′′ showing detectable signals even at 6 hpi. Finally, results generated with this approach, normalisation to transgenic markers or even raw parenchymal values of tracer intensity, generate the same conclusions. In addition, we point the reviewer to a recent pre-print that further validates this method from our team[4].

      Overall, we find the use of this tracer an ideal way to normalise for differences in injection volumes between animals and we recommend the use of this method to other groups assessing BBB leakage in zebrafish.

      Additionally, it's intriguing that the signal intensity in the parenchyma of the tested tracers presents a substantial range, varying by 20-30% in the analysed cohort (Figure 1g, Extended Figure 1e). Such large variability raises the question of its origin. Could it be a consequence of the normalization to 2000 kDa dextran intensity which differs between different fish? Or is it due to the differences in the parenchymal signal intensity while the baseline 2000 kDa intensity is stable? Or is the situation mixed?

      This is a good point raised by the reviewer.

      To address this, we have used the following approaches:

      (1) We provide additional experiments and normalisation methods that support the utility of our tracer studies (new data in Fig 2a–f and Extended Data Fig. 2b–c), discussed in detail below.

      (2) We provide graphs of the raw parenchymal distribution of tracer not normalised at all (also requested by reviewer 1). This is provided in Author response image 1 and further supports all our conclusions, showing that our normalisation methods generate meaningful data.

      Overall, the range of parenchymal intensity that we see after tracer injection and live imaging shows variations introduced during microinjection. However, these ranges are in-line with previous publications using similar methods (see studies by O’Brown et al 2019 and 2023)[5,6], allow reliable statistical comparisons to be drawn between control and mutants and allow us to detect both immature and functional BBB states during zebrafish development (new data in Fig. 2e-f).

      Of note, the variability we see is likely introduced during the injection process into tiny larval blood vessels and is the reason why we perform normalization of parenchymal tracers to a vascular dextran signal that doesn’t leak from brain vessels. In our studies, 2000-kDa dextran has been co-injected with the smaller size tracers, therefore any potential differences in injection volumes as well as imaging conditions (however consistent) should be reduced by this method.

      An alternative and potentially more effective approach would be to cross the pdgfrb mutant line with a line where endothelial cells are genetically labeled to define vessels (e.g. the line kdrl used in acquiring data presented in Figure 2a). Non-injected controls could then be used as a baseline to assess tracer extravasation into the parenchyma.

      We thank the reviewer for this suggestion.

      In response, we have performed new tracer leakage experiments at 7 and 14 dpf in siblings and pdgfrb mutants and quantified parenchymal tracer extravasation by normalizing to vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). The results were in-line with the previously presented and independent experiments and showed indistinguishable phenotypes between siblings and pdgfrb mutants (new data, Fig. 2a–d). We also used uninjected controls to assess baseline and saw consistent values approaching zero in these images and did not include this in the revised paper.

      Furthermore, we have also used this approach in wild-type larvae at 3 dpf (immature BBB) and 7 dpf (functional BBB)[5]. We detected significantly higher parenchymal extravasation of 10 and 70 kDa tracers at 3 dpf compared to 7dpf, demonstrating that our method can detect leakage (new data, Fig. 2e–f).

      We believe that both normalization approaches have advantages (as discussed above), therefore showing the same results with these two different approaches has further strengthened our findings.

      How is the data presented in Figure 3e generated? How was the dextran intensity calculated? It looks like the authors have used the kdrl line to define vessels. Was the 2000 kDa still used as in previous figures? If not, please describe this in the Materials and Methods section.

      We have moved this data to Fig. 4e (previously Fig. 3e).

      Previously, we had plotted raw data due to the nature of the experiment being conducted on a vibratome sectioned tissue. The 2000 kDa tracer was not used. In response to this query and to be consistent with the new approach suggested by the reviewer, we have revised the quantification by normalizing the 10 kDa tracer extravasation to Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) for this and the new experiments on juveniles (Fig. 5h–i). Please see the corresponding figure legends or revised methods (lines 464–472).

      (5) The authors state that both controls and mutants show extravasation of 1 kDa NHS-ester into the parenchyma. However, the presented images do not illustrate this; it is not obvious from these images (Extended Data Figure 1c). Additionally, the presented quantification data (Extended Data Figure 1e) do not show that, at 7 dpf, the vasculature is permeable to this tracer. Note that the range of signal intensity of the 1 kDa NHS-ester is similar to the 70 kDa dextran (Figure 1g and Extended Figure 1e). Would one expect an increase in the ratio in case of extravasation, considering that the 2000 kDa dextran has the same intensity in all experiments? Please explain.

      We thank the reviewer for raising this important point.

      To clarify, we have never claimed that “2000-kDa dextran has the same intensity in all experiments”. On the contrary, vascular 2000 kDa normalization has been used to account for potential differences caused by injection, as stated in the submitted supplementary materials and now made more clear in the revision.

      In response to this query, we conducted more detailed analysis on tracer extravasation patterns based on molecular weight (new data, Extended Data Fig 2b–c). This analysis showed that 1- and 10-kDa tracers have much higher extravasation rate compared to 70- and 2000-kDa tracers. Interestingly, we did not find a significant difference between 1 and 10 kDa extravasation. Therefore, in the revised manuscript we used only 10 kDa in further experiments and have removed 1 kDa from the figures.

      To assess the tracers individually (new data in Extended Data Fig. 2c), parenchymal extravasation of individual tracers was normalised to their own vascular signal (eg. Mean intensity of 10 kDa in midbrain/mean intensity of 10 kDa in vasculature), to account for potential differences in injection volume. This provides a suitable method to assess leakage in wild-type animals and is now in line with how previous studies have analysed such tracer injections[5,6]. Please see revised figure legends and supplementary materials for details.

      (6) The study would be strengthened by a more detailed temporal analysis of the phenotype. When do the aneurysms appear? Is there an additional loss of VSMC?

      We thank the reviewer for this suggestion, and we have now performed staged imaging of the pdgfrb mutants and siblings between 7 and 21 dpf using TgBAC(acta2:EGFP)<sup>uq17bh</sup> transgene (new data, Fig. 3b-c; Extended Data Fig. 4a–d). Consistent with previous results, acta2:EGFP-positive cells surrounding the middle mesencephalic central arteries (MMCtA) were missing in pdgfrb mutants. At 21 dpf, we have also observed a mild dilation of these vessels, likely the earliest changes to generate aneurysms (new data, Fig. 3c).

      To extend the number of stages analysed in this study, we have also performed new tracer leakage experiments in juveniles (30 dpf) and found that aneurysms can be detected at this age when the 10 kDa tracer is used (new data in Fig. 5b–b′). Consistent with the adult stage phenotype, aneurysms were limited to the larger calibre vessels (arteries) in the brain. We have also observed hotspots, and upon quantification, we found fewer numbers in juveniles compared to adults, suggesting that severity of aneurysms and hotspots increase with age.

      Taken together, our results show that the aneurysms in pdgfrb mutants start appearing at late larval/early juvenile stages (~21 dpf) with observable dilations. By 30 dpf, aneurysms accompanied by small numbers of hotspots are observed, which exhibits significantly increased numbers by adulthood. This also correlates with reduced development and survival rate of pdgfrb mutants after 30 dpf (new data, Extended Data Fig. 1d–e).

      (7) The authors intended to analyze the BBB at later stages (line 128), but there is not a significant time difference between 2 months (Figure 2) and 3 months (Figure 3) considering that zebrafish live on average 3 years. Therefore, the selection of only two time-points, 2 and 3 months, to analyze BBB changes does not provide a comprehensive overview of temporal changes throughout the zebrafish's lifespan. How long do the pdgfb mutants live?

      Respectfully, zebrafish transition from juvenile stages to adulthood between 2 and 3 months and there are many significant differences in the physiology of this organism at these two ages. At 2 months, zebrafish are still juveniles undergoing metamorphosis with rapid growth and ongoing skeletal and vascular development. By 3 months, they are sexually mature adults and have much more developed cranioskeletal and vascular systems. Having said that, we take the reviewers important point that further temporal resolution would improve the study.

      We have performed new experiments in 1-month-old animals and provided comprehensive analysis of the vascular phenotypes occurring in pdgfrb mutants. These were very informative experiments analysing leakage using 10-kDa tracer injections and have significantly improved the study. We had previously provided experiments at 5-month-old adults as well (previously Fig. 4a–b and Extended Data Fig. 4a) and so now the study includes larval stages (7, 14 dpf), juveniles at 1 and 2 months and adults at 3 and 5 months. While the additional timepoints did not offer up any new conclusions, they significantly enhanced the body of work overall.

      Of further note, we provided survival data up to 90 dpf where survival of the pdgfrb mutants is significantly reduced compared to siblings (Extended Data Fig. 1e). We believe this is associated with the severity of the aneurysms and haemorrhages which probably lead to lethality in these mutants.

      (8) Why is there a difference in tracer permeability between 2 and 3 months (Figures 2 and 3)? Are hemorrhages not detected in 2-month-old zebrafish?

      In response to this and other queries, we have added new additional experiments that provide more detailed temporal analysis on tracer accumulation (new data in Fig. 5b–c, Fig. 5f–g).

      In short, we do not see obvious haemorrhages in 1- or 2-month fish at a gross level during dissections (not shown). We find that using 10-kDa tracer, we can detect small hotspots at aneurysms as early as 1 month, likely representing the earliest loss of integrity. We do not see obvious hotspots in 2-month-old animals when we use the 70-kDa tracer, this suggests to us that it is less sensitive for hotspot detection (in line with new Extended Data Fig. 2c). Finally, we find that the number of hotspots increases dramatically from Juvenile to Adult stages in our datasets, which we take as indicative of a progressive phenotype.

      Overall, tracer size matters for detecting hotspots and they become more apparent in older animals - we have added a note in the main text to cover these points (lines 200–205)

      (9) Figure 3: The capillary bed should be presented in magnified images as it is not clearly visible. Figure 3e shows that in the pdgfb mutant the dextran intensity is higher also in regions 6-10. How do the authors explain this?

      We thank the reviewer for raising this important point.

      Firstly, we now include enlarged views of the capillary beds for this experiment (Fig. 4d′) and new experiments mentioned below.

      Secondly, in relation to why there is higher tracer in lateral locations and not just medial sites of haemorrhage, we believe that this is most likely due to the progressive spread of tracer from the medial hotspots. To test if this is likely, we performed additional experiments and tested tracer accumulation at 2 different timepoints in brains collected at 0.5 or 6 hpi (new data in Fig. 5f–g, Extended Data Fig. 6a–b′′). Tracer accumulation at 0.5 hpi was very minimal and was primarily limited to hotspots and nearby regions new data in (Fig. 5h), whereas a higher tracer accumulation in brains was observed across medial to lateral regions at 6 hpi (new data in Fig. 5i) in pdgfrb mutants. Comparing the data in Figure 4 (2 hpi) and new data in Figure 5i (6 hpi), the 10 kDa-tracer appears to have spread to more lateral locations given the increased time allowed post injection.

      We cannot formally exclude the possibility that tracer leakage does occur slower through capillaries than at major hotspots, which might fit with the proposed model of slow leakage via increased EC transcytosis[7-9]. However, considering that we cannot detect increased tracer accumulation in pdgfrb mutants that lack aneurysms and haemorrhages at 7 and 14 dpf, such a scenario would require capillary transcytosis to be active at later juvenile and adult stages but not in larval and late larval animals. Thus, we believe the most plausible explanation is that aneurysm/haemorrhage associated leakage is the primary cause of the vascular integrity defects in zebrafish pdgfrb mutants.

      We have added discussions addressing this in the revised manuscript (lines 220–230, 300–302).

      (10) In general, the manuscript would benefit from a more detailed description of the performed experiments. How long did the tracer circulate in the experiments presented in Figures 2, 3, and 4?

      We thank the reviewer for this suggestion and have now ensured that this is clearly described for in figure legends and methods (lines 391–395).

      (11) How do the authors explain the poor signal of the 70 kDa dextran from the vasculature of 5-month-old zebrafish presented in Extended Data Figure 3?

      We agree that the dextran signal was reduced compared to the other experiments in that Figure. This is likely due to sample preparation and clearing causing reduced fluorescence. Upon consideration of the presented data and the additional experiments using 10 kDa tracers providing further validations for our claims, we decided to remove this data from the paper.

      (12) The study would benefit from a clear separation of the phenotypes caused by the loss of VSMC. The title eludes that also capillaries present hemorrhages which is not the case. How do vascular mural cells differ from mural cells? Are there any other mural cells?

      We take the reviewers point and have now updated the title as "Mural cells protect the adult brain from haemorrhage but do not control the blood-brain barrier in developing zebrafish."

      (13) I have a few comments about how the authors have interpreted the literature and why, in my opinion, they should revise their strong statements (e.g., the last sentence in the abstract).

      Scientists have their own insights and interpretations of data. However, when citing published data, it should be clearly indicated whether the statement is a direct quote from the original publication or an interpretation. In the current manuscript, the authors have not correctly cited the data presented in the two published papers (references 5 and 6). These papers do not propose a model where pericytes suppress "adsorptive transcytosis" (lines 73-76). While increased transcytosis is observed in pericyte-deficient mice, the specific type of vesicular transport that is increased or induced remains unknown.

      Similarly, lines 151-152 refer to references 5 and 6 and use the term "adsorptive transcytosis," but the authors of both papers did not use this term. Attributing this term to the original authors is inaccurate. Additionally, lines 152-153 do not accurately represent the findings of references 5 and 6. These papers do not state that there is an induction of "caveolae" in endothelial cells in pericyte-deficient mice. In the absence of pericytes, many vesicles can be observed in endothelial cells, but these vesicles are relatively large. It is more likely that there is some form of uncontrolled transcytosis, perhaps micropinocytosis. Please refer to the original papers accurately.

      We thank the reviewer for these comments. We take the point and have rewritten the manuscript carefully to improve accuracy and avoid misrepresenting any previous claims made in specific papers.

      Also, the authors have missed the fact that in mice, the extent of pericyte loss correlates with the extent of BBB leakage. To a certain extent, the remaining pericytes, can compensate for the loss by making longer processes and so ensure the full longitudinal coverage of the endothelium. This was shown in the initial work of Armulik et al (reference 5) and later in other studies.

      We certainly did not miss this important point (as we are also working with these mouse models) and we now include reference to this in our expanded discussion. Of note, we do think it would be worthwhile assessing if the extent of BBB leakage and pericyte coverage also correlates with the presence of microhaemorrhages in these hypomorphic mouse models, although this is more challenging to do in mice than in zebrafish.

      The bold assertion on lines 183 -187 that a lack of specific BBB phenotype in pdgfrb zebrafish mutant invalidates mouse model findings is unfounded. Despite the notion that zebrafish endothelium possesses a BBB, I present a few examples highlighting the differences in brain vascular development and why the authors' expectation of a straightforward extrapolation of mouse BBB phenotypes to zebrafish is untenable.

      In mice Pdgfrb knockout is lethal, but in zebrafish, this is not the case. In marked contrast to mice, however, zebrafish pdgfrb null mutants reach adulthood despite extensive cerebral vascular anomalies and hemorrhage. Following the authors' argumentation about the unlikely divergence of zebrafish and mice evolution, does it mean that the described mouse phenotype warrants a revisit and that the Pdgfrb knockout in mice perhaps is not lethal? Another example where the role of a gene product is not one-to-one, which relates to pericyte development, is Notch3. Notch3-null mice do not show significant changes in pericyte numbers or distribution, suggesting a less prominent role in pericyte development compared to zebrafish.

      Although many aspects of development are conserved between species, there are significant differences during brain vascular development between zebrafish and mice. These differences could reveal why the BBB is not impaired in zebrafish pdgfrb mutants. There is a difference in the temporal aspect when various cellular players emerge. The timing of microglia colonization in the brain differs. In mice, microglia colonization starts before the first vessel sprouts enter the brain, while in zebrafish, microglia enter after. Additionally, microglia in zebrafish and mice have a different ontogeny. In mice, astrocytes specialize postnatally and form astrocyte endfeet postnatally. In zebrafish, radial glia/astrocytes form at 48 hpf, and as early as 3 dpf, gfap+ cells have a close relationship with blood vessels. Thus, these radial glia/astrocyte-like cells could play an important role in BBB induction in zebrafish. It's worth noting that in Drosophila, the blood-brain barrier is located in glial cells. While speculative, these cells might still play a role in zebrafish, while the role of pericytes does not seem to be crucial. Pericytes enter the brain and contact with developing vasculature (endothelium) relatively late in zebrafish (60 hpf). In mice, the situation is different, as there is no such lag between endothelium and pericyte entry into the brain. I suggest that the authors approach the observed data with curiosity and ask: Why are these differences present? Are all aspects of the BBB induced by neural tissue in zebrafish? What is the contribution of microglia and astrocytes?"

      Another interesting aspect to consider is the endothelial-pericyte ratio and longitudinal coverage of pericytes in the zebrafish brain, and how this relates to what is observed in mice. How similar is the zebrafish vasculature to the mouse vasculature when it comes to the average length of pericytes in the zebrafish brain? Does the longitudinal coverage of pericytes in the zebrafish brain reach nearly 100%, as it does in mice?

      Based on the preceding arguments, it is recommended that the authors present a balanced discussion that provides insightful discussion and situates their work within a broader framework.

      Overall, we agree with most of the points made by the reviewer above. As we have now extended the format of this paper to be a full article, we have space to provide an extended discussion and introduction. We now try to capture many of the points made by the reviewer and we think that this has significantly improved the paper. We thank the reviewer for this contribution.

      We do want to point out that we did not state that our findings using zebrafish pdgfrb mutants invalidate mouse model findings. We suggest that a deeper analysis to understand the nature of the hotspots in mural cell deficient mammalian models could be very interesting in light of the zebrafish observations. We hope that the revised discussion better reflects this.

      Reviewer #3 (Public review):

      This manuscript examines the role of pdgfrb-positive pericytes in the establishment and maintenance of the blood-brain barrier (BBB) in the zebrafish. Previous studies in PDGFB- or PDGFRB-deficient mice have suggested that loss of pericytes results in disruption of the BBB. The authors show that zebrafish pdgfrb mutant larvae have an intact BBB and that pdgfrb mutant adult fish show large vessel defects and hemorrhage but do not exhibit substantial leakage from brain capillaries, suggesting loss of pericytes is not sufficient to "open" the BBB. The authors use beautiful and compelling images and rigorous quantification to back up most of their conclusions. The imaging of the adult brain is particularly nice. The authors rigorously document the lack of BBB leakage in pdgfrbuq30bh mutant larvae and large vessel phenotypes (eg, enlargement and rupture) in pdgfrbuq30bh mutant adults. A few points would help the authors to further strengthen their findings contradicting the current dogma from rodent models.

      We appreciate the reviewer's comments on the manuscript overall and agree that addressing the raised points was needed to strengthen our findings. We have addressed the main points below and believe that this revision greatly improves this study.

      Major point:

      The authors document pericyte loss using a single TgBAC(pdgfrb:egfp)ncv22 transgenic line driven by the promoter of the same gene mutated in their pdgfrbuq30bh mutants. Given their findings on the consequences of pericyte loss directly contradict current dogma from rodent studies, it would be useful to further validate the absence of brain pericytes in these mutants using one of several other transgenic lines marking pericytes currently available in the zebrafish. This could be done using pdgfrb crispants, which the authors show nicely phenocopy the germline mutants, at least in larvae. This would help nail down the absence of any currently identifiable pericyte population or sub-population in the loss of pdgfrb animals and substantially strengthen the authors' conclusions.

      We thank the reviewer and agree that examination of pdgfrb<sup>uq30bh</sup> mutants using another transgenic line labelling pericytes would further validate the absence of brain pericytes. We generated a transgenic line, TgBAC(abcc9:abcc9-T2A-mCherry)<sup>uom139</sup>, to visualise pericytes and validated the absence of brain pericytes in the pdgfrb mutants (revised Extended Data Fig. 1b). The loss of brain pericytes matched our findings using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line as well as previously published data by Ando et al 2016-2021, where the brain pericytes except for metencephalic artery were missing[2,3].

      Other issues:

      The authors should provide more information about the pdgfrbuq30bh mutant and how it was generated (including a diagram in a supplemental figure would be useful).

      We thank the reviewer for this suggestion. In addition to the explanations provided in supplementary materials, we have added a schematic, provided sanger sequencing results showing the mutation as well as predicted effect of the mutation on the protein domains (Extended Data Fig. 1a).

      It would be helpful to show some data on whether mutants show morphological phenotypes or developmental delay at 7 and 14 dpf, to provide some context to better assess the reduced branching and vessel length vascular phenotypes (see Figures 1c-e).

      We thank the reviewer for this suggestion. We have provided further details on body length and survival of the pdgfrb mutants until 90 dpf. As reported by Ando et al 2021, we did not observe any distinguishing feature until about 30 dpf[1,3]. The adult anatomy of our mutant allele matches that of previously described null mutants and is now shown (Extended Data Fig. 1f).

      If available, it would be helpful to have a positive control for the tracer leakage experiments - a genetic manipulation that does cause disruption of the BBB and leakage at 2 hours post-tracer injection (see Figures 1f and g).

      We thank the reviewer for this suggestion and agree that a positive control would validate reliability of our method. We have performed new experiments at 3 dpf when BBB integrity is not yet established and at 7 dpf when BBB is functional in zebrafish[5], testing both 10 and 70 kDa tracers (new data in Fig. 2e–f). We detected significantly higher tracer accumulation at 3 dpf, showing that our methods can detect tracer leakage in the brain.

      Quantification of the findings in Figure 4c, d would be useful, as would the use of germline fish for these experiments if these are now available. If this is not possible, it would be helpful to document that the crispants used in these experiments lack pdgfrb:egfp pericytes at adult stages (this is only shown for 5 dpf larvae, in Extended Data Figure 4b).

      We thank the reviewer for this comment. Using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line, we have imaged coronal brain sections collected from 10-week old pdgfrb crispants and uninjected siblings (age-matched animals used in Fig. 5d–e, previously Fig. 4c–d). We have now included data showing that adult pdgfrb crispants lack brain mural cells, phenocopying pdgfrb<sup>uq30bh</sup> mutants (new data, Extended Data Fig. 6f). These particular crispants are very reliable in our hands and nicely reproduce stable mutant phenotypes, giving us confidence to use the faster F0 approach in this experiment.

      Adult mutants clearly show less dye leakage in the more superficial capillary regions than WT siblings, but dextran intensity is a bit higher, although this could well be diffusion from more central brain regions where overt hemorrhage is occurring. Along similar lines though, the authors' TEM data in Extended Data Figure 4d hints that there may be more caveolae in mutant brain capillaries, although the N number was lower here than for the measurements from TEM of larger central vessels (Figure 4g). It would be useful to carry out additional measurements to increase the N number in Figure 4d to see whether the difference between wild-type sibling and mutant capillary caveolae numbers remains as not significant.

      We thank the reviewer for these raising important points and suggestions.

      Firstly, in relation to signal in capillary regions and likely diffusion from hotspots, please see the response to reviewer 3 point 9 above.

      Secondly, we have imaged and analysed more capillaries in both pdgfrb mutants and siblings (Extended Data Fig. 7a–b, previously Extended Data Fig. 4d). The results showed no significant difference between these groups, suggesting that capillary EC transcytosis is unchanged in our pdgfrb mutants.

      It might be helpful to include some orienting labels and/or additional descriptions in the figure legends to help readers who are not used to looking at zebrafish brain vessels have an easier time figuring out what they are looking at and where it is in the brain.

      We thank the reviewer for this suggestion and agree that adding further information in the figure legends and illustrations about orientation would make it easier for readers. In addition to the information provided in the figure legends in the submitted version, we have added an illustration, more labels on the revised figures, extended the descriptions in figure legends, main text and methods.

      We have added a schematic depicting the tracer leakage assay workflow, orientation of live imaging and analysed region of interest (Extended Data Fig. 1a–b).

      All figure legends have been updated with the anatomical position and microscopy view.

      Additional labels on figures have been added to understand the referenced vessel names (new data in Fig. 3c and Extended Data Fig. 4a–b′).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study uses the intensity of tracer signals within the vessels to analyze BBB permeability, potentially underestimating leakage severity. The dye intensity is measured 2 hours after injection, however, other studies have already observed leakage after 30 Minutes, by imaging directly in the brain parenchyma. The overall intensity should also decrease through leakage from the other vessels of the body, e.g. in the trunk and tail. Probably the loss of intra-vascular dye intensity from leakage in barrier-free vessels is already so high (after 2 hours) that the smaller amount of leakage across the BBB cannot be observed.

      We thank the reviewer for this comment and suggestion. We agree that small sized tracers leak from vasculature, particularly through fenestrated vessels in the trunk and tail. We have based our timing on previous studies and our own experience. In zebrafish, the study by O’Brown et al 2019 also used 2 hpi[5] for detection of leakage in mfsd2aa mutants, which also has been proposed to regulate BBB integrity by controlling EC transcytosis. Therefore, we believe that performing experiments at 2 hpi is appropriate to investigate roles of pericytes in BBB integrity. Our data would suggest that this timing works.

      In response to this and other comments, we performed further experiments and analyses to test leakage of tracers testing molecular weights ranging from 1 to 2000 kDa individually. We showed that these tracers can reliably be detected in brain parenchyma and vasculature when imaged at 2 hpi. In another study, we showed that medium size tracers such as 40 kDa Dextran can be reliably detected in the vasculature in similar timepoints[10]. Considering we have performed experiments using 10 and 70 kDa tracers do detect parenchymal tracer accumulation and tracer still within the vessels, we believe this timepoint is appropriate for assessing BBB integrity in zebrafish.

      In addition to these experiments, see our tracer leakage experiments in 1-month-old animals, at 0.5 and 6 hpi to test leakage pattern described above (Fig. 5 and Extended Data Fig. 6).

      Therefore, the authors will need to validate their method of choice, showing an impairment of the BBB, caused by other agents (known to affect the BBB), and at 48hpf, when the BBB is not tightened yet. One example for BBB impairment can be found in O'Brown et al (2019), eLife 8e47326. doi: 10.7554/eLife.47326

      We thank the reviewer for this suggestion. As shown by O’Brown et al 2019, we have performed experiments at 3 dpf when BBB integrity is not mature and at 7 dpf when BBB is functional[5], testing both 10 and 70 kDa tracers. We detected significantly higher tracer accumulation at 3 dpf, showing our new additional method (see below) can detect tracer leakage in the brain (new data in Fig. 2e–f).

      Ideally, the authors would also supplement the method with additional approaches in the younger developmental stages to validate their findings.

      The validation of the method and the findings is particularly important for the claims of lack of BBB impairment in the absence of mural cells, as this is a "negative" finding.

      In response to this and comments from other reviewers, we performed additional tracer leakage experiments (new data in Fig. 2a–d) where we imaged 10 and 70 kDa tracers with a vascular reporter (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this reporter for normalisation. Both this approach as well as the experiments provided in the first submission (updated as Extended Data Fig. 3a–d) showed that pdgfrb mutants at 7 and 14 dpf have indistinguishable BBB integrity compared to siblings. See also Author response image 1 that further addresses this.

      I also strongly suggest to rephrase and downtown the claim that vascular mural cells do not control the blood-brain barrier in developing zebrafish.

      As a negative finding cannot be proven completely and lots of the previously shown effects on murine BBB impairment are rather weak (when caused by single agents such as Claudin5 deficiency or Sphingosine-phosphate receptor1 knockout), it might be important to only claim that in zebrafish no strong impairment (as observed in the mural cell-deficient mouse) could be observed. Or rephrase it to "no impairment as severe as/comparable to ... could be observed" and then provide an impairment control for the developmental stages.

      We thank the reviewer for this comment and agree that negative findings are very challenging to prove. However, we find no evidence of leakage of the BBB in animals lacking mural cells at 7 and 14 dpf and believe that our data is robust on this point. As such, we believe we show that a vertebrate with a largely conserved EC BBB, can have intact barrier function in the absence of mural cells.

      We have as suggested revised our claims throughout the manuscript to provide more further nuanced discussion of this, but we do not want to water down our claims too much as we believe they are important. We hope that the reviewer will appreciate our carefully worded and expanded discussion section.

      Additional items of interest to the readers and therefore suggestions to improve the manuscript could be

      (1) To include more molecular analysis: while the study identifies caveolae induction and basement membrane thickening as potential contributors to focal leakage, the exact molecular mechanisms linking mural cell loss to these structural changes are not deeply investigated.

      (2) Also, the study primarily associates BBB disruption in the adult with aneurysms. Therefore other subtle or diffuse changes to BBB permeability that might occur even without overt vascular lesions are potentially underrepresented.

      However, following up experimentally on these might exceed the scope of the manuscript.

      We thank the reviewer for these suggestions and agree with both points. However, as stated by the reviewer, these experiments are beyond the scope of the manuscript and represent future directions for our lab and others.

      Reviewer #2 (Recommendations for the authors):

      (1) Mouse genes should be written as follows: Pdgfb, Pdgfrb and be in italics. See line line 70: it should be written "Pdgfb and Pdgfrb (italics)" and not "PdgfB and Pdgfrβ".

      We have updated the text according to the reviewer’s suggestion.

      (2) Please state the age of the fish analyzed in Figure 1f and 1g.

      We have moved this data to Extended Fig. 3a–d (previously Fig. 1f-g) and have placed age information on the images and in the figure legends.

      (3) Is the reduced vascular complexity in pdgfb mutant due to reduced angiogenesis or due to excessive pruning?

      This is a good question, and we do not know at this stage. We have unpublished data that suggest pericytes secrete angiogenic growth factors, but this question warrants a thorough investigation that we believe is beyond the scope of this current study.

      (4) Please check that the figure legends state the correct number of fish analysed. For example, Figure 1 d, e N=8 but there seem to be 9 data points per group - 14dpf.

      We apologise for this mistake and thank the reviewer for raising this. We have updated the graphs and figure legends accordingly.

      (5) Please indicate in the figures the genotypes (wt, het) of a sibling presented alongside a pdgfb mutant.

      Wild-type and heterozygous mutants are commonly used together in zebrafish research as a collective control group termed siblings. Since we didn’t see any difference between wild-type and pdgfrbuq30bh/- groups in any experiments, we reported these groups together. This is now stated in the supplementary materials.

      One exception to this was examination of the growth and survival rates where we show the genotypes separately (new data in Extended Data Fig. 1b-f).

      (6) Please explain clearly what region is shown in Figure 2B. I do not understand the explanation "approximate location of dotted line". Is the image in the panel "a" top view of a brain?

      We have moved this data to Fig. 3a′ (previously Fig. 2b) and replaced the dotted line in Figure 3a (previously Fig. 2a) with a white box indicating the location of the restricted region in the whole brain image.

      We have revised the text as below:

      “Subset of z-slices from the whole brain imaging in (a) and (b) (white boxes) indicating mural cell loss and abnormal capillary network patterning. 100-μm-thick maximum intensity projections (MIP) were generated using the continuation of the left middle mesencephalic central artery (MMCtA, arrow) as an anatomical landmark.”

      In addition, we have updated all our figure legends clearly stating the view and anatomical position of the imaged sample.

      (7) Figure 2e: Note that- the dotted areas do not correspond to the areas magnified. Please adjust.

      We have moved this data to Extended Data Fig. 5a (previously Fig. 2e–e′) and updated the location of the white box in 5a shown in enlarged view in 5a′.

      (8) Lines 112 and 114 - Should the indicated figure be Figure 2b-d and Figure 2c-d, respectively, and not Figure 1?

      We thank the reviewer for pointing out this mistake. All the figure legends are now referred to appropriately in the revised manuscript.

      (9) Data presented in Figure 2 and Figure 3 can be consolidated and presented as one Figure.

      We thank the reviewer for this suggestion. After addition of new data and revising the manuscript we have decided to keep these data presented separately.

      (10) Note that Figure 2a,b shows 5-month-old fish, not 2-month-old fish. Additionally, Extended Data Figure 3 shows 5-month-old fish, not 3-month-old fish.

      The stages noted by the reviewer were correctly indicated.

      (11) Figure 2d: Please clarify the definition of a "large vessel".

      We have observed normal morphology in capillaries and noted aneurysms and hotspots in large calibre vessels such as arteries, which become more severe over time. We have revised this across the manuscript accordingly.

      (12) Figure 4a, b: Please explain how the hotspots of leakage were defined based on the extravasated tracer.

      Hotspots of leakage are scored when fluorescent tracer aggregates are clearly observed outside the vessels. Vessel borders were defined using the transgenic lines (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). We have added a clear description in the methods section (lines 473–475).

      Figure 4c: Why were Pdgfrb crispants used and not the mutant line?

      They were used as pdgfrb crispants phenocopy the lack of brain mural cells (Extended Data Fig. 5e, previously Extended Data Fig. 4b) and mutant phenotype reliably and for practical reasons, because they allow faster experiments and reduce fish usage.

      Figure 4e: The magnification of the electron microscopy images does not make it possible to clearly identify caveolae. What was the magnification of the collected images for caveolae analysis? How did the authors ensure that they quantified only caveolae and not other types of vesicles?

      Respectfully, we disagree that the magnification is insufficient as our images were captured and analysed consistent with previous ultrastructural descriptions[11,12]. We based our quantification of caveolae on the size of vesicles observed and define them as circular profiles of less than 100 nm in diameter and were scored as luminal or abluminal based on proximity to each surface membrane (within 500 nm of each surface or in a thin-walled vessel the caveolae closest to each surface) (lines 398–409). Importantly, comparable analyses at similar magnifications have been independently validated in multiple caveola-deficient zebrafish genetic models[4,13]. Interestingly given the reviewers comments above, we do see increased vesicular structures that are larger than caveolae, but we only provide quantification of the caveolae here.

      Reviewer #3 (Recommendations for the authors):

      Congratulations to the authors on their really beautiful imaging and rigorous quantitative documentation of phenotypes - this is a really nicely done study, and could be very important to the field with just a few additional experiments to buttress the key conclusions.

      We thank the reviewer for their kind comments.

      In addition to the comments noted in the public review, I would only point out that there are two mislabeled call-outs in the text (Lines 112 and 114; says Figure 1, should say Figure 2).

      We thank the reviewer for this point and have now revised the text accordingly.

      (1) Ando, K., Ishii, T. & Fukuhara, S. Zebrafish Vascular Mural Cell Biology: Recent Advances, Development, and Functions. Life (Basel) 11 (2021). https://doi.org/10.3390/life11101041

      (2) Ando, K. et al. Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development 143, 1328-1339 (2016). https://doi.org/10.1242/dev.132654

      (3) Ando, K. et al. Conserved and context-dependent roles for pdgfrb signaling during zebrafish vascular mural cell development. Dev Biol 479, 11-22 (2021). https://doi.org/10.1016/j.ydbio.2021.06.010

      (4) Lim, Y. W. et al. Trans-Endothelial Trafficking in Zebrafish: Nanobio Interactions of Polyethylene Glycol-Based Nanoparticles in Live Vasculature. ACS Nano (2026). https://doi.org/10.1021/acsnano.5c21042

      (5) O'Brown, N. M., Megason, S. G. & Gu, C. Suppression of transcytosis regulates zebrafish blood-brain barrier function. Elife 8 (2019). https://doi.org/10.7554/eLife.47326

      (6) O'Brown, N. M. et al. The secreted neuronal signal Spock1 promotes blood-brain barrier development. Dev Cell 58, 1534-1547 e1536 (2023). https://doi.org/10.1016/j.devcel.2023.06.005

      (7) Armulik, A. et al. Pericytes regulate the blood-brain barrier. Nature 468, 557-561 (2010). https://doi.org/10.1038/nature09522

      (8) Daneman, R., Zhou, L., Kebede, A. A. & Barres, B. A. Pericytes are required for blood-brain barrier integrity during embryogenesis. Nature 468, 562-566 (2010). https://doi.org/10.1038/nature09513

      (9) Mae, M. A. et al. Single-Cell Analysis of Blood-Brain Barrier Response to Pericyte Loss. Circ Res 128, e46-e62 (2021). https://doi.org/10.1161/CIRCRESAHA.120.317473

      (10) Lim, Y.-W. et al. A Standardized Protocol to Investigate Trans- Endothelial Trafficking in Zebrafish: Nano-bio Interactions of PEG-based Nanoparticles in Live Vasculature. bioRxiv, 2025.2007.2023.666282 (2025). https://doi.org/10.1101/2025.07.23.666282

      (11) Parton, R. G. & Simons, K. The multiple faces of caveolae. Nat Rev Mol Cell Biol 8, 185-194 (2007). https://doi.org/10.1038/nrm2122

      (12) Parton, R. G. & del Pozo, M. A. Caveolae as plasma membrane sensors, protectors and organizers. Nat Rev Mol Cell Biol 14, 98-112 (2013). https://doi.org/10.1038/nrm3512

      (13) Lim, Y. W. et al. Caveolae Protect Notochord Cells against Catastrophic Mechanical Failure during Development. Curr Biol 27, 1968-1981 e1967 (2017). https://doi.org/10.1016/j.cub.2017.05.06

    1. eLife Assessment

      The authors aim to understand why Kupffer cells (KCs) die in metabolic-associated steatotic liver disease (MASLD). This is a valuable study using in vitro studies and an in vivo genetic mouse model, suggesting that increased glycolysis contributes to KC death in MASLD. The data presented are now convincing and adequately revised. This work will be of interest to researchers in the immunology and metabolism fields.

    2. Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. A few things require clarification.

      Strengths:

      • The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      • The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      • The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      • The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      • The TUNEL staining in the overview in Figure 2 is not convincing. Typically the signal overlaps with DAPI, which is mostly not the case in the figures shown.

      • The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      • Figure S5: shows deltadelta CT values, not relative values. What are the housekeeping genes? There should be at least 2, and they should not have metabolically related functions such as Gapdh.

      • Figure 1C: shows WT and KO gating side by side

      • The following point has not been answered: "While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself." Expression of certain genes that indicate function does not show whether BMDMs isolated from these KO mice are fully differentiated. Here, counting BM input/ BMDM output, flow cytometry on BMDMs, morphology etc. should be tested.

    3. Reviewer #4 (Public review):

      Summary:

      In this study, He et al. investigate the mechanisms underlying Kupffer cell (KC) loss during metabolic stress. It has long been observed that embryonically derived KCs decline in obesity and liver disease, a loss that is compensated by monocyte recruitment, although the underlying mechanisms remain unclear. The authors propose that metabolic reprogramming, particularly excessive glycolysis, drives KC death. Using an original murine genetic model to modulate glycolysis, they further demonstrate that enhanced glycolytic activity exacerbates KC damage.

      Strengths:

      Overall, the study is extremely clearly presented, with a convincing and simple message destined to a vast audience.

      Weaknesses:

      This manuscript has already undergone one round of revisions in which I was not involved. The authors have tried to address several points raised by the previous reviewers, notably regarding the unexpectedly high level of TUNEL staining observed in KCs. However, I share these concerns expressed by the three reviewers that the reported levels remain difficult to reconcile with the biology. A TUNEL positivity rate of ~60% at week 16 of the HFHC diet would imply massive KC death, which should have led to a near-complete depletion of the KC population, something that is not observed. While I agree that the KC compartment is clearly affected under this dietary challenge, I would strongly encourage the authors to carefully rule out potential technical biases that could account for this implausibly high rate of cell death.

      Considering the new in-vivo experiment with 2-DG, it is definitely convincing and clearly adds some value to the full study.

      So the full story deserves publication.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the mechanisms underlying Kupffer cell death in metabolic-associated steatotic liver disease (MASLD). The authors propose that KCs undergo massive cell death in MASLD and that glycolysis drives this process. However, there appears to be a discrepancy between the reported high rates of KC death and the apparent maintenance of KC homeostasis and replacement capacity.

      Strengths:

      This is an in vivo study.

      Weaknesses:

      There are discrepancies between the authors' observations and previous reports, as well as inconsistencies among their own findings.

      Before presenting the percentage of CLEC4F<sup>+</sup>TUNEL<sup>+</sup> cells, the authors should have first shown the number of CLEC4F<sup>+</sup> cells per unit area in Figure 1. At 16 weeks of age, the proportion of TUNEL<sup>+</sup> KCs is extremely high (~60%), yet the flow cytometry data indicate that nearly all F4/80<sup>+</sup> KCs are TIMD4<sup>+</sup>, suggesting an embryonic origin. If such extensive KC death occurred, the proportion of embryonically derived TIMD4<sup>+</sup> KCs would be expected to decrease substantially. Surprisingly, the proportion of TIMD4<sup>+</sup> KCs is comparable between chow-fed and 16-week HFHC-fed animals. Thus, the immunostaining and flow cytometry data are inconsistent, making it difficult to explain how massive KC death does not lead to their replacement by monocyte-derived cells.

      We thank the reviewer for the insightful comment and the opportunity to clarify this important point. To ensure consistency between our methodologies, we replaced Clec4f staining with TIM4 staining results as requested by the reviewer. We first showed the number of TIM4<sup>+</sup> cells per unit area in Figure 1B. The results showed a significant and progressive loss of TIM4<sup>+</sup> cells per unit area in the liver parenchyma, decreasing from approximately 60 cells/FOV at baseline (0w) to nearly 50 at 4w and further to about 30 at 16w post-HFHC diet. This finding is fully consistent with our flow cytometry data. The percentage of the embryonically derived KC population (CD11blow F4/80hi TIM4hi) among CD45<sup>+</sup> cells dropped from 30.2% (0w) to 24.3% (4w) and 17.6% (16w) (Revised Figure 1C). The absolute number per gram of liver decreased from roughly 12 x 10<sup>5</sup> (1w) to 9 x 10<sup>5</sup> (4w) and 5 x 10<sup>5</sup> (16w) (Revised Figure 1D).

      These data suggest that despite the reported high rate of cell death among CLEC4F<sup>+</sup>TIMD4<sup>+</sup> KCs, the population appears to self-maintain, with no evidence of monocyte-derived KC generation in this model, which contradicts several recent studies in the field.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the remaining embryonic KCs(EmKCs) are maintained through self-renewal, as the proportion of Ki67<sup>+</sup>TIM4<sup>+</sup> cells remains low at all time points (Revised Figure S2D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure S2B, S2C), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMF expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4-KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (Ref. 1,2). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      Moreover, there is no evidence that TIM4<sup>+</sup>CLEC4F<sup>+</sup> KCs increase their proliferation rate to compensate for such extensive cell death. If approximately 60% of KCs are dying and no monocyte-derived KCs are recruited, one would expect a much greater decrease in total KC numbers than what is reported.

      Thank you for raising this point, which allows for an important clarification. The interpretation that approximately 60% of KCs are dying is correct, but this refers to the proportion of the remaining KC population at 16 weeks that is TUNEL<sup>+</sup>, not to 60% of the original KC pool. Since our data show that over half of the EmKCs are lost by 16 weeks (Revised Figure 1B), the 60% of dying cells at this late time point corresponds roughly to only 25-30% of the total original KC population at baseline. This distinction reconciles the high rate of apoptosis observed late in disease with the overall progressive depletion of the EmKC pool.

      It is also unexpected that the maximal rate of KC death occurs at early time points (8 weeks), when the mice have not yet gained substantial weight (Figure 1B). Previous studies have shown that longer feeding periods are typically required to observe the loss of embryo-derived KCs.

      We appreciate the reviewer’s insightful observation. We think KC death is a continuous event during MASLD. To induce MASH, previous studies typically assess the loss of EmKCs after longer feeding periods, which might leave us an impression that longer feeding periods are required to observe substantive loss of embryonically derived KCs. In our HFHC model, the proportion of dying KCs was already elevated by 8 weeks, and this high rate was sustained through the 16-week endpoint. In a separate MCD dietary model characterized by rapid MASLD progression, a high rate of KC death was detectable as early as 6 weeks (Revised Figure 1F). Collectively, these data suggest that the onset of significant KC death is dependent on the pace of MASLD pathogenesis, more likely an early-initiated event that is through MASLD progression.

      Furthermore, it is surprising that the HFD induces as much KC death as the HFHC and MCD diets. Earlier studies suggested that HFD alone is far less effective than MASH-inducing diets at promoting the replacement of embryonic KCs by monocyte-derived macrophages.

      We appreciate the reviewer’s insightful comment. In our study, we observed significant KCs death under both HFD and HFHC feeding for 20, 16 weeks, respectively. Moreover, both HFHC and HFD induced similar stages of MASLD (characterized by significant lipid accumulation without fibrosis development) by these time points (Authir response image 1). Therefore, these data support that the onset of substantial KCs death may be an early MASLD event, before the progression to MASH. Additionally, this finding aligns with existing literature showing that 16 weeks of HFD feeding alone is sufficient to cause a marked reduction in the TIM4<sup>+</sup>KCs population (Ref. 1).

      Author response image 1.

      Detection of liver fibrosis in MASLD mouse models. Male wild-type C57BL/6J mice were fed a high-fat, high-cholesterol (HFHC) diet for 16 weeks or a high-fat diet (HFD) for 20 weeks to induce MASLD. Mice fed a normal chow diet (NCD) served as controls. (A) Sirius Red staining of liver sections was performed to assess collagen deposition and fibrosis during MASLD progression. Scale bar, 20 μm. (B) Western blot analysis of liver tissue lysates showing α-smooth muscle actin (α-SMA) expression as a marker of hepatic stellate cell activation and liver fibrosis.

      In Figure 2D, TIMD4 staining appears extremely faint, making the results difficult to interpret. In contrast, the TUNEL signal is strikingly intense and encompasses a large proportion of liver cells (approximately 60% of KCs, 15% of hepatocytes, 20% of hepatic stellate cells, 30% of non-KC macrophages, and a proportion of endothelial cells is also likely affected). This pattern closely resembles that typically observed in mouse models of acute liver failure. Given this apparent extent of cell death, it is unexpected that ALT and AST levels remain low in MASH mice, which is highly unusual.

      Thank you for this important feedback. To address concerns about the clarity of our imaging, we have provided high-resolution split-channel raw images for Figure 2D (Revised Figure 2D), which distinctly show the localization of TIM4, TUNEL, and GS. These confirm the progressive reduction of TIM4<sup>+</sup>KCs and the increase in TUNEL<sup>+</sup> TIM4<sup>+</sup>cells over time. We agree that the high proportion of TUNEL<sup>+</sup>cells seems at odds with the modest ALT/AST elevation. This discrepancy might be explained by the distinct nature of cell death in MASLD. Unlike the acute necrosis with membrane rupture seen in acute liver failure—which causes massive, rapid enzyme release— obesity-related liver injury is a chronic process dominated by apoptosis (Ref. 4,5). Apoptosis preserves membrane integrity until late stages (Ref. 6), with dying cells packaged into apoptotic bodies for efficient phagocytic clearance by neighboring macrophages (Ref. 7,8). This controlled disposal system minimizes the leakage of intracellular enzymes. Therefore, the coexistence of widespread apoptosis (high TUNEL signal) with limited enzyme release (low ALT/AST) is a recognized feature of chronic MASLD pathogenesis.

      No statistical analysis is provided for Figure 5D, and it is unclear which metabolites show statistically significant changes in Figure 5C.

      We thank the reviewer for raising this statistical problem. We have now included statistical analysis in Revised Figure 5D.

      In addition, there is no evaluation of liver pathology in Clec4f-Cre × Chil1flox/flox mice. It remains possible that the observed effects on KC death result from aggravated liver injury in these animals. There is also no evidence that Chil1 deficiency affects glucose metabolism in KCs in vivo.

      We thank the reviewer for these important points. We previously characterized the liver pathology of Clec4f<sup>ΔChil1</sup> mice in detail (preprint: eLife 2025, DOI: 10.7554/eLife.107023.1, Fig. 2). On a normal chow diet, these mice showed no differences in body weight, hepatic lipid deposition, metabolic parameters, or glucose tolerance compared to controls. However, on an HFHC diet, Clec4f<sup>ΔChil1</sup> mice developed significantly worse metabolic and histological phenotypes. Crucially, our in vitro data demonstrate that recombinant Chi3l1 directly reduces KC death (preprint, Fig. 6E-F), indicating that the aggravated MASLD in knockout mice is a consequence of increased KC loss, not its cause.

      Regarding glucose metabolism, we have previously shown that Chi3l1 deficiency leads to increased glucose uptake by KCs in vivo using the fluorescent glucose analog 2-NBDG. This effect was reversed by supplementing knockout mice with recombinant Chi3l1 (preprint Fig. 6G-H). This provides direct evidence that Chi3l1 modulates glucose uptake in KCs in vivo.

      Finally, the authors should include a more direct experimental approach to modulate glycolysis in KCs and assess its causal role in KC death in MASH.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) in the HFHC-induced MASLD model (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for four weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity during active disease development. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, He et al. set out to investigate the mechanisms behind Kupffer Cell death in MASLD. As has been previously shown, they demonstrate a loss of resident KCs in MASLD in different mouse models. They then go on to show that this correlates with alterations in genes/metabolites associated with glucose metabolism in KCs. To investigate the role of glucose metabolism further, they subject isolated KCs in vitro to different metabolic treatments and assess cleaved caspase 3 staining, demonstrating that KCs show increased Cl. Casp 3 staining upon stimulation of glycolysis. Finally, they use a genetic mouse model (Chil1KO) where they have previously reported that loss of this gene leads to increased glycolysis and validate this finding in BMDMs (KO). They then remove this gene specifically from KCs (Clec4fCre) and show that this leads to increased macrophage death compared with controls.

      Strengths:

      As we do not yet understand why KCs die in MASLD, this manuscript provides some explanation for this finding. The metabolomics is novel and provides insight into KC biology. It could also lead to further investigation; here, it will be important that the full dataset is made available.

      Weaknesses:

      Different diets are known to induce different amounts of KC loss, yet here, all models examined appear to result in 60% KC death. One small field of view of liver tissue is shown as representative to make these claims, but this is not sufficient, as anything can be claimed based on one field of view. Rather, a full tissue slice should be included to allow readers to really assess the level of death.

      Thank you for raising this point regarding data presentation. We analyzed full tissue slices and found that including a view of the entire slice at a standard magnification makes individual KC difficult to resolve (Author response image 2). To clearly represent the extent and distribution of KCs death across the liver tissue slice, we now include lower-magnification images that provide a wider field of view, allowing readers to assess the pattern across a larger tissue area (Revised Figures 1, 2, 6F).

      Author response image 2.

      Assessment of KCs death on full liver tissue slice. (A) Immunofluorescence staining was performed to detect Kupffer cell (KC) death in liver sections from mice fed an MCD diet for 6 weeks. Cell death was assessed by TUNEL staining (green), and KCs were identified by TIM4 staining (red). Nuclei were counterstained with DAPI (blue). Representative whole-tissue view is shown. Scale bars, 1mm.

      Additionally, there is no consistency between the markers used to define KCs and moMFs, with CLEC4F being used in microscopy, TIM4 in flow, while the authors themselves acknowledge that moKCs are CLEC4F+TIM4-. As moKCs are induced in MASLD, this limits interpretation. Additionally, Iba1 is referred to as a moMF marker but is also expressed by KCs, which again prevents an accurate interpretation of the data. Indeed, the authors show 60% of KCs are dying but only 30% of IBA1+ moMFs, as KCs are also IBA1+, this would mean that KCs die much more than moMFs, which would then limit the relevance of the BMDM studies performed if the phenotype is KC specific. Therefore, this needs to be clarified.

      We thank the reviewer for the constructive comments. For consistency, we have standardized our KC marker to TIM4 for all immunostaining data, aligning it with our flow cytometry analysis (Revised Figures 1, 2D, 6F). We have also clarified that IBA1 is expressed by hepatic macrophages (both KCs and MoMFs)(Revised Figure 2C, Revised manuscript, page 5, lines 182-183). Moreover, we also included the clarification that 60% of TIM4<sup>+</sup> KCs are TUNEL<sup>+</sup> versus 30% of total IBA1<sup>+</sup> cells further supports that KCs undergo death more readily than MoMFs (Revised manuscript, page 5, lines 186-189). We also acknowleged the limitation of BMDM studies in the Revised manuscript, page 8, line 332-340.

      The claim that periportal KCs die preferentially is not supported, given that the majority of KCs are peri-portal. Rather, these results would need to be normalised to KC numbers in PP vs PC regions to make meaningful conclusions.

      We thank the reviewer for this important point. We included the normalized data. At 8 weeks, the normalized death rate was significantly higher in periportal versus pericentral regions (p = 0.041), supporting increased periportal KC susceptibility during early MASLD. By 16 weeks, proportional death rates became comparable between zones (Revised Figure 2D, Revised manuscript, page 6, lines 194-201).

      Additionally, KCs are known to be notoriously difficult to keep alive in vitro, and for these studies, the authors only examine cl. Casp 3 staining. To fully understand that data, a full analysis of the viability of the cells and whether they retain the KC phenotype in all conditions is required.

      We appreciate the reviewer’s suggestions. To confirm the identity and health of isolated KCs in our in vitro studies, we showed that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      Finally, in the Cre-driven KO model, there does not seem to be any death of KCs in the controls (rather numbers trend towards an increase with time on diet, Figure 6E), contrary to what had been claimed in the rest of the paper, again making it difficult to interpret the overall results.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Additionally, there is no validation that the increased death observed in vivo in KCs is due to further promotion of glycolysis.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity in KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments support a contributory role for excessive glycolytic activity in promoting KCs death in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. Still, several interpretations require moderation to avoid overstatement, and certain experimental details, particularly those concerning flow cytometry and population gating, need further clarification.

      Strengths:

      (1) The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      (2) The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      (3) The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      (4) The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      (1) The novelty is questionable. The presented work has considerable overlap with a study by the same lab, which is currently under review (citation 17), and it should be considered whether the data should not be presented in one paper.

      We appreciate the reviewer for the opportunity to clarify the relationship between the two studies. In our previous work (citation 17), we focused on the transcriptional metabolic differences between Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs) and identified Chi3l1 as a selective protective factor that limits glucose uptake and shields KCs from metabolic stress–induced cell death, with minimal effects on MoMFs. That study directly motivated the current work. The observation that KCs are uniquely protected from metabolic stress led us to hypothesize that excessive glycolytic activation itself may be a primary driver of KCs death, which forms the central question of the present study. Accordingly, the current manuscript shifts the focus from Chi3l1-mediated protection to the mechanistic role of hyperglycolysis in driving KCs mortality, using distinct experimental approaches and addressing a different biological question. Because the two studies address conceptually distinct aims—one defining a protective regulator of KCs survival and the other dissecting glycolysis-driven KCs death mechanisms—we believe they are best presented as separate manuscripts. Combining them into a single study would dilute the mechanistic depth and clarity of each story.

      (2) The authors report that 60% of KCs are TUNEL-positive after 16 weeks of HFHC diet and confirm this by cleaved caspase-3 staining. Given that such marker positivity typically indicates imminent cell death within hours, it is unexpected that more extensive KC depletion or monocyte infiltration is not observed. Since Timd4 expression on monocyte-derived macrophages takes roughly one month to establish, the authors should consider whether these TUNEL-positive KCs persist in a pre-apoptotic state longer than anticipated. Alternatively, fate-mapping experiments could clarify the dynamics of KC death and replacement.

      We thank the reviewer for this astute observation. As shown in revised Figure 2D, the proportion of TIM4<sup>+</sup>TUNEL<sup>+</sup>KCs peaks at 8 weeks after HFHC feeding and remains elevated at 16 weeks. However, examination of the corresponding single-channel TIM4 staining during this period reveals that the overall density of TIM4<sup>+</sup> KCs does not undergo abrupt or synchronous depletion. This temporal dissociation between sustained TUNEL positivity and relatively gradual KCs loss suggests that TUNEL-positive KCs do not undergo immediate clearance. Based on these observations, we agree with the reviewer that a substantial fraction of TUNEL-positive KCs likely persists in a prolonged pre-apoptotic or stressed state rather than undergoing rapid cell death. This interpretation is consistent with the absence of extensive KCs depletion or compensatory monocyte infiltration at these time points. Importantly, previous studies (Ref. 1,2) indicate that KCs are eventually lost as MASLD progresses, supporting the notion that KC death is a gradual process that unfolds over an extended time frame rather than acutely.

      (3) The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity of KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      (4) The study does not address the polarization or ontogeny of KCs during early MASLD. Given that pro-inflammatory macrophages preferentially utilize glycolysis, such data could provide valuable insight into the reason for increased KC death beyond the presented hyperreliance on glycolysis.

      We thank the reviewer for this insightful comment. Regarding KCS ontogeny, flow cytometry analysis (Revised Figure 1C) shows that KCs remain uniformly TIM4<sup>hi</sup> during early MASLD, indicating that monocyte-derived KCs (TIM4<sup>low</sup>) have not yet emerged at these stages. To address KCs polarization, we assessed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in KCs isolated from WT mice fed a HFHC diet for 0, 8, and 16 weeks. As shown in revised Figure S5A, M1 markers progressively increase over time, whereas M2 markers remain unchanged or slightly decrease. This polarization shift is consistent with the increased glycolytic activity observed in KCs during early MASLD. Together, these data indicate that embryonically derived KCs undergo a pro-inflammatory polarization accompanied by enhanced glycolytic metabolism during early MASLD, providing mechanistic context for their increased susceptibility to metabolic stress–induced cell death beyond hyperreliance on glycolysis alone (Revised manuscript, page 7-8, line 307-321).

      (5) The gating strategy for monocyte-derived macrophages (moMFs) appears suboptimal and may include monocytes. A more rigorous characterization of myeloid populations by including additional markers would strengthen the study's conclusions.

      We thank the reviewer for raising this important point. To improve the rigor of our analysis, we adopted gating strategies established in previous studies (PMID: 41131393; PMID: 32562600). Specifically, Kupffer cells were defined as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> cells, while monocyte-derived macrophages (MoMFs) were defined as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/−</sup> cells, thereby excluding contaminating neutrophils and minimizing inclusion of circulating monocytes. Using this refined gating strategy, we observed a progressive reduction of KCs accompanied by a corresponding increase in MoMFs in WT mice during HFHC feeding (Revised Figures 1C and S2B–C), (Revised manuscript, page 4, line 154-163).

      (6) While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself.

      We thank the reviewer for this important question. To determine whether Chi3l1 deficiency affects macrophage differentiation, we analyzed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in Kupffer cells isolated from WT and Chil1<sup>-/-</sup> mice fed a HFHC diet for 0, 8, and 16 weeks. At baseline (0 weeks), Chi3l1 deficiency was associated with elevated expression of multiple M1 markers, whereas M2 marker expression was comparable between WT and Chil1<sup>-/-</sup> KCs. During MASLD progression, the pro-inflammatory signature in Chil1<sup>-/-</sup> KCs was further enhanced, while anti-inflammatory marker expression became dysregulated (revised Figure S5C). Together, these data indicate that Chi3l1 deficiency does not impair macrophage differentiation per se but biases KCs toward a partially pro-inflammatory, M1-like phenotype, providing additional context for the enhanced glycolytic flux observed in Chi3l1-deficient macrophages (Revised manuscript, page 7-8, line 307-321).

      (7) The authors use the PDK activator PS48 and the ATP synthase inhibitor oligomycin to argue that increased glycolytic flux at the expense of OXPHOS promotes KC death. However, given the high energy demands of KCs and the fact that OXPHOS yields 15-16 times more ATP per glucose molecule than glycolysis, the increased apoptosis observed in Figure 4C-F could primarily reflect energy deprivation rather than a glycolysis-specific mechanism.

      We thank the reviewer for highlighting this important point. We agree that KCs are highly metabolically active and that perturbations of OXPHOS can influence overall cellular energy balance. As noted in our response to comment #3, we further performed glycolysis inhibition assay by 2-DG in vivo, the protection of KCs observed following 2-DG in vivo (Revised Figure 4E-G) further provides evidence that increased glycolytic flux is not merely correlated with, but functionally contributes to KCs loss in

      MASLD.

      (8) In Figure 1C, KC numbers are significantly reduced after 4 and 16 weeks of HFHC diet in WT male mice, yet no comparable reduction is seen in Clec4Cre control mice, which should theoretically exhibit similar behavior under identical conditions.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the concerns raised in the public review, the authors should:

      (1) Reassess their conclusions using the same panels in flow and microscopy, e.g., the combination of CLEC4F, TIM4, and IBA1. This will allow resKCs (CLEC4F+TIM4+IBA1+), moKCs (CLEC4F+TIM4-IBA1+), and moMFs (CLEC4F-TIM4-IBA1+) to be accurately defined and hence their viability and numbers correctly assessed.

      We thank the reviewer for this insightful suggestion. In our flow cytometry analysis, we did not detect a CD45<sup>+</sup>CD11b<sup>low</sup>F4/80<sup>hi</sup>TIM4<sup>low</sup> population, indicating that monocyte-derived KCs (moKCs) have not emerged in our model at this stage. To more accurately quantify resident KCs (resKCs) in the current study, we replaced CLEC4F with TIM4 staining and enumerated TIM4<sup>+</sup>as well as TIM4<sup>+</sup>TUNEL<sup>+</sup> cells. These data were highly consistent with CLEC4F<sup>+</sup>TUNEL<sup>+</sup>cell counts, confirming that moKCs are not involved in KCs death during early MASLD (Revised Figure 1A,B,E,F).

      (2) Investigate why the number of KCs in controls and MASLD are so distinct between Figures 1 and 6.

      We appreciate the reviewer’s suggestions. Like we explained above, Cre insertion promotes KCs self-renewal (Revised manuscript, Figure S8). This enhanced proliferative capacity likely accounts for the relative preservation of KCs numbers in Clec4f-Cre mice during HFHC feeding, explaining the apparent discrepancy with WT mice (Revised manuscript, Figure 6D-E).

      (3) Normalise the tunel+ cells based on the number of KCs in PP vs PC regions.

      After normalizing KCs death to KCs numbers in periportal (PP) versus pericentral (PC) regions, we found the proportion was significantly higher in PV regions compared to CV regions at 8 weeks of HFHC feeding. We have therefore revised our texts. (Revised manuscript, page 5, lines 194-201).

      (4) Demonstrate the viability of KCs in vitro across conditions.

      To confirm the identity and health of isolated KCs in our in vitro studies, we show that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      (5) Confirm previous studies demonstrating different degrees of KC loss depending on the model of MASLD.

      We thank the reviewer for highlighting this point. Consistent with previous studies, KCs loss has been reported to varying degrees depending on the MASLD model used, reflecting the heterogeneity of hepatic macrophages, marker choice, mouse husbandry, and diet regimen. For example, in a 6-week MCD feeding model, ~10% of CLEC4F<sup>+</sup> KCs were TUNEL<sup>+</sup> (Figure 4A, Ref. 9). Another 6-week MCD study reported a drop from 66% to 26% TIM4<sup>+</sup> KCs (Figure 2A, Ref. 12). In an HFD model, TIM4<sup>+</sup> KCs decreased by ~20% after 16 weeks (Figure 1G, Ref. 1). In a Western diet model, TIM4<sup>+</sup>KCs decreased by >50% at 36 weeks (Figures 1J and 2C, Ref. 2). Together, these studies underscore the model-dependent nature of KCs loss and highlight the importance of experimental context and marker selection when assessing KCs dynamics in MASLD. We have included these studies in our discussion section (Revised manuscript, page 9-10, line 393-402)

      (6) Demonstrate in vivo that loss of CHIL1 drives further glycolysis in KCs.

      In Figure 6G-H of our previous study, we showed that Chi3l1 deficiency leads to more glucose uptake by KCs in vivo whereas suppelementing KO mice with recombinant Chi3l1 will significantly reduced glucose uptake by KCs through treating mice with a fluorescent glucose analog 2-NBDG. We included the related figure here as Author response image 3.

      Author response image 3.

      Chi3l1 limits glucose uptake by Kupffer cells in vivo. (A) Measurement of 2-NBDG (a fluorescent glucose analog) uptake by KCs in vivo. WT and Chil1<sup>-/-</sup> mice, either untreated or supplemented with rChi3l1, were injected intraperitoneally with 12 mg/kg 2-NBDG. After 45mins, KCs were isolated and glucose uptake assessed by spectrophotometry. (B) Representative immunofluorescence images of liver sections stained for TIM4 (red) and 2-NBDG uptake (green) to visualize glucose uptake by KCs in situ. Scale bar = 10 µm (zoom). Quantification is shown as the percentage of TIM4<sup>+</sup> cells that are also 2-NBDG<sup>+</sup>. Representative images were shown in B. One-way ANOVA was performed in A, B. P value is as indicated.

      (7) There is no mention of the publication of the metabolomics dataset; this should be released with the manuscript.

      We included the raw metabolomics dataset as Table S1 and S2 now.

      Reviewer #3 (Recommendations for the authors):

      (1) Methods: Reconsider which methods are described in the main text versus the Supplementary Information to improve readability and consistency.

      Thank you for your valuable suggestion. We have reevaluated and adjusted the placement of the methods section between the main text and the supplementary materials.

      (2) Line 34: Check for grammar issues.

      L34 has been revised as follows : Additionally, using Chi3l1-deficient mice, we further demonstrated that increased glucose utilization accelerates KCs death in vivo.

      (3) Lines 101, 110: Explicitly reference the corresponding Supplementary Methods sections.

      We have included the references for these two methods sections (Revised supplementary materials and methods, Line 30, 65, respectively).

      (4) Figure 2: Iba1 marks all macrophages, not only monocyte-derived macrophages; both figure and text (line 205) require correction.

      We have corrected Iba1 represent hepatic macrophages including both KCs and MoMFs (Revised Figure 2C, manuscript page 5, line 182).

      (5) Line 218-219: Avoid overinterpretation, as only KCs, hepatocytes, and hepatic stellate cells were assessed - not all hepatic populations.

      We appreciate the reviewer’s valuable suggestion and rephrased our description accordingly (Revised manuscript, page 5, line 186-189).

      (6) Line 262: Use abbreviations consistently throughout the manuscript.

      We have gone through the whole manuscript and double checked the abbreviations.

      (7) Line 264: Include the palmitic acid (PA) concentration used.

      We included 800 µM PA in the revised manuscript (Revised manuscript, page 6, line 250).”

      (8) Lines 316-317: Check for grammar errors.

      Grammar errors are checked (Revised manuscript, page 8, line 340-341).

      (9) Line 337-338: See comment above on gating strategy.

      We updated gating strategy accordingly (Revised manuscript, page 9, line 361-362).

      (10) Line 343-344: Note that Chi3l1 is not exclusively expressed by KCs.

      We rephrased our words accordingly (Revised manuscript, page 9, line 374-378).

      (11) Lines 355-358: The statement that "sustained glycolytic hyperactivation culminates not in sustained activation, but in apoptotic cell death" is unsupported by data or literature, as macrophage polarization was not analyzed in this study.

      We removed the statement from the revised manuscript.

      (12) Lines 375-379: Rephrase to clarify that while KCs are metabolically active and glucose-demanding, excessive glycolytic flux accelerates apoptosis.

      We have rephrased to clarify (Revised Manuscript, page 10, lines 405-407).

      (13) Lines 375-385 & 387-397: Consolidate overlapping statements for conciseness and coherence.

      We have consolidate the overlapping statements (Revised manuscript, page 10, lines 405-425).

      Reference

      Daemen, S. et al. Dynamic Shifts in the Composition of Resident and Recruited Macrophages Influence Tissue Remodeling in NASH. Cell Rep 34, 108626, doi:10.1016/j.celrep.2020.108626 (2021).

      Remmerie, A. et al. Osteopontin Expression Identifies a Subset of Recruited Macrophages Distinct from Kupffer Cells in the Fatty Liver. Immunity 53, 641-657.e614, doi:10.1016/j.immuni.2020.08.004 (2020).

      Ozer, J., Ratner, M., Shaw, M., Bailey, W. & Schomaker, S. The current state of serum biomarkers of hepatotoxicity. Toxicology 245, 194-205, doi:10.1016/j.tox.2007.11.021 (2008).

      Malhi, H. & Gores, G. J. Molecular mechanisms of lipotoxicity in nonalcoholic fatty liver disease. Semin Liver Dis 28, 360-369, doi:10.1055/s-0028-1091980 (2008).

      Ibrahim, S. H., Hirsova, P. & Gores, G. J. Non-alcoholic steatohepatitis pathogenesis: sublethal hepatocyte injury as a driver of liver inflammation. Gut 67, 963-972, doi:10.1136/gutjnl-2017-315691 (2018).

      Kerr, J. F., Wyllie, A. H. & Currie, A. R. Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. British journal of cancer 26, 239-257, doi:10.1038/bjc.1972.33 (1972).

      Poon, I. K., Lucas, C. D., Rossi, A. G. & Ravichandran, K. S. Apoptotic cell clearance: basic biology and therapeutic potential. Nat Rev Immunol 14, 166-180, doi:10.1038/nri3607 (2014).

      Krenkel, O. & Tacke, F. Liver macrophages in tissue homeostasis and disease. Nat Rev Immunol 17, 306-321, doi:10.1038/nri.2017.11 (2017).

      Tran, S. et al. Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.e625, doi:10.1016/j.immuni.2020.06.003 (2020).

      O'Neill, L. A. & Pearce, E. J. Immunometabolism governs dendritic cell and macrophage function. J Exp Med 213, 15-23, doi:10.1084/jem.20151570 (2016).

      Vander Heiden, M. G. & DeBerardinis, R. J. Understanding the Intersections between Metabolism and Cancer Biology. Cell 168, 657-669, doi:10.1016/j.cell.2016.12.039 (2017).

      Zhang J, Wang Y, Fan M, Guan Y, Zhang W, Huang F, Zhang Z, Li X, Yuan B, Liu W, Geng M, Li X, Xu J, Jiang C, Zhao W, Ye F, Zhu W, Meng L, Lu S, Holmdahl R. Reactive oxygen species regulation by NCF1 governs ferroptosis susceptibility of Kupffer cells to MASH. Cell Metab. 2024 Aug 6;36(8):1745-1763.e6. doi: 10.1016/j.cmet.2024.05.008. Epub 2024 Jun 7. PMID: 38851189.

    1. eLife Assessment

      The present manuscript by Cordeiro et al., shows convincing evidence that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as a strong activator of the large-conductance (BK) potassium channels. The authors suggest that α-mangostin activation of the BK channel is state-independent, and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Additionally, the authors show that α-mangostin can relax arteries, further suggesting the plausibility of the proposed effects of this compound. These are valuable findings that should be of interest to channel biophysicists and physiologists alike.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodiltory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Caᵥ nanodomains mimicking physiological condition and functional proof-of-concept validation in mouse aortic rings.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      The author's rebuttal provides alphafold3 models for mutants. While there are interesting preliminary observations, the authors decided not to include these in the main manuscript, awaiting further structual validation. I concur.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      In their response, the authors acknowledge the importance of measuring Ca2+ sparks in smooth muscle cells to further validate their findings. However, this is not provided in the manuscript. Part of my earlier comment alludes to the possibility of α-Mangostin directly affecting Cav1.2 or ryanodine receptor activity, and therefore BK activity would go up. With the current provided evidence, these possibilities cannot be excluded and need to be acknowledged.

      (3) The work has impact for ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter however are preliminary in nature.

      The authors acknowledge that additional vascular physiology experiments would strengthen the argument they make. They are however unable to provide such evidence in the present manuscript. Therefore, I strongly suggest that the authors tune down the physiological implications of α-Mangostin that they include in the manuscript. I'd also suggest that "vasorelaxation" is removed from the manuscript title, given the preliminary nature of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion by examining the effects of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 subunits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent, and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates noradrenaline-induced contracture. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      In this revised version of the manuscript by Cordeiro et al., the authors have adequately answered my previous concerns. However, as I stated in my comments, without determining the probability of opening across a wide range of voltages, any conclusion about the drug's mechanism of action can be questioned. For example, the statement in Discussion line 481: "The higher shift observed in 1 μM Cai 2+ may reflect the steep Cai2+-dependence of the closed-open equilibrium (Cui, Cox and Aldrich, 1997) and the allosteric coupling of voltage and Cai2+ signals (Horrigan and Aldrich, 2002; Magleby, 2003; Clay, 2017), which are effective in this concentration range, which may lead to a higher apparent activation when voltage activation is facilitated by Cai 2+ (Sun and Horrigan, 2022)." has no support in the data and is not predicted by the allosteric model. In order to have a larger shift induced by the drug in the presence of Ca2+, you need either to alter the Ca2+ binding or the allosteric coupling factor C.<br /> Please note that in the manuscript, there are several problems with the English in this sentence.

      Minor

      In Figure 1E, BKa should read BKalpha.

    4. Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protecting properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protecting effects.

      Strengths:

      The authors have satisfactorily answered my previous comments and present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio protective effects.

      Weaknesses:

      The identification of the binding site continues to be the least developed point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produces by a-mangostin. This binding site should be demonstrated in the future using structural techniques such as cryo-EM.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodilatory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Ca<sub>v</sub> nanodomains mimicking physiological conditions and functional proof-of-concept validation in mouse aortic rings.

      We thank the reviewer for acknowledging the strength of the different aspects investigated in our study.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      In the attempt to improve structural insight for the presented mutagenesis data, we have used Alphafold3 (AF3; Abramson et al., 2024) to generate models of the I308A, L312M and A316P substitutions and repeated the docking for each (Fig. R1). According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift).

      Author response image 1.

      Alphafold3 models of BK I308A, L312M, and A316P with α-Mangostin docked to the mutant structures. The upper row shows an overview of the mutant pore helices (AF3 models) used for molecular docking. The lower row shows the binding region with the wildtype structure overlaid in gray. Only 3 helices are shown for clarity.

      Although these results provide interesting tentative explanations for the effect of the mutations and conclusions from AF3 models become increasingly robust, we think that definitive statements of their mechanistic contributions would require experimental studies of mutant channels, i.e., cryo-EM or crystallography, that are beyond our means. Therefore, we have decided not to include this data in the manuscript; however, it is accessible for the interested reader within the public review. Hopefully, as cryo-EM structures have been obtained for the wildtype channel, there will be studies on mutations of this gating-relevant S6 segment in the future.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      We are not sure if a global elevation of cellular calcium concentration would be informative. We rather expect that the relevant local Ca<sup>2+</sup> elevation would occur as sparks in the BK-Ca<sub>v</sub> nanodomains, close to the membrane. We would anticipate a change in spark duration, as the Ca<sup>2+</sup> inward current would be stopped faster by the enhanced repolarization via a-Mangostin activated BKα/β1 channels. This would require fast Ca<sup>2+</sup> imaging acquisition speed to capture spark activity. We concur that this would be an informative experiment to investigate a more native situation. However, we would have to accomplish such methodologically challenging measurements in a separate project, which could fruitfully be combined with a more extensive characterization of aortic contraction as also suggested in the following remark (3).

      (3) The work has an impact on ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter, however, are preliminary in nature.

      We completely agree with the reviewer that there is ample room for further studies that could characterize different tissues important in blood pressure regulation (such as resistance arteries), elucidate even more physiological detail (such as modulatory effects of the endothelium), or look deeper into the pharmacology using chemically altered Mangostin derivatives. While we very much like this to happen in future projects, in this study we focused on the functional aspects of a-Mangostin in BK channel gating. We present our tension recordings as a proof-of-concept to underline the activity of a-Mangostin in native tissues, and we clearly show the importance of the BK channel by using iberiotoxin as a specific inhibitor which impressively abolished relaxation.

      References:

      Abramson, J. et al. (2024) “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, 630(8016), pp. 493–500. Available at: https://doi.org/10.1038/s41586-024-07487-w.

      Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion through the effect of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 sununits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates the contracture promoted by noradrenaline. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      Strengths:

      The set of results combining electrophysiological measurements, mutagenesis, and molecular docking reveals α-mangostin as a potent activator of BK channels and the putative location of the α-mangostin binding site. Moreover, experiments conducted on aortic preparations from mice suggest that α-mangostin can aid in developing drugs to treat a myriad of diverse diseases involving the BK channel.

      We thank the reviewer for pointing out the significance of our study.

      Weaknesses:

      Major:

      (1) Although the results indicate that α-mangostin is modifying the closed-open equilibrium, the conclusion that this can be due to a stabilization of the voltage sensor in its active configuration may prove to be wrong. It is more probable that, as has been demonstrated for other activators, the α-mangostin is increasing the equilibrium constant that defines the closed-open reaction (L in the Horrigan, Aldrich allosteric gating model for BK). The paper will gain much if the authors determine the probability of opening in a wide range of voltages, to determine how the drug is affecting (or not), the channel voltage dependence, the coupling between the voltage sensor and the pore, and the closed-open equilibrium (L).

      We would like to take the opportunity to clarify this potential misunderstanding. In our manuscript, we have discussed three mechanistic explanations for the Mangostin activation: (1) an electrostatic effect at the selectivity filter, (2) structural and electrostatic changes of S6 that facilitate the opening of a putative lower gate, and (3) hydrophobic gating, i.e., counteracting dewetting of the pore. All possibilities would impact S6 and lower the free energy for pore opening, and we concur that therefore Mangostin most likely affects the closed-open equilibrium (L) of the BKα channel.

      The sentence at the original lines 470-471, “(…) caused by an enhanced shift of the closed-open equilibrium toward the open state, such as the stabilization of the voltage sensor in an active conformation” refers to the observation that the presence of the β1 subunit enhances this closed-open shift. The stabilization of the voltage sensor domain was mentioned as one example of how it achieves this. We recognize that this example was an unfortunate choice, as β1 rather facilitates Ca<sup>2+</sup>-dependent allosteric pore opening unrelated to the discussed mechanisms of Mangostin. We have therefore removed this statement.

      As to the suggestion to dissect the effect of Mangostin on C, D, and L, we agree with the reviewer that this would surely add to a full biophysical characterization. However, in our project, we strove towards including more experiments showing the physiological implications of Mangostin activation to emphasize the implication for vasodilation. We hope the reviewer understands that, with limited resources, this came at the expense of a full investigation of the different gating components, which could pose a separate project by itself.

      (2) Apparently, the molecular docking was performed using the truncated structure of the human BK channel. However, it is unclear which one, since the PDB ID given in the Methods (6vg3), according to what I could find, corresponds to the unliganded, inactive PTK7 kinase domain. Be as it may, the apo and Ca2+ bound structures show that there is a rotation and a displacement of the S6 transmembrane domain. Therefore, the positions of the residues I308, L312, and A316 in the closed and open configurations of the BK channel are not the same. Hence, it is expected that the strength of binding will be different whether the channel is closed or open. This point needs to be discussed.

      We apologize for the typing error and thank the reviewer for indicating this erroneous PDB ID. (“6vg3”). It should have read PDB ID 6v3g as in the legend to Fig. 4B. The reviewer appropriately points out that there are differences in the S6 segment addressed in our study between the two available cryo-EM structures obtained in the presence (PDB ID 6v38) and absence of Ca<sup>2+</sup> (PDB ID 6v3g) (Tao and MacKinnon, 2019).

      We had actually performed the docking with both structures, but chosen to show the Ca<sup>2+</sup>-free structure to better visualize the I308 position. a-Mangostin is found in the same S6 region in both, not obstructing the K<sup>+</sup> conduction pathway. The binding energies of the favored poses are very similar; the binding energy in the best-ranking conformational cluster in the Ca<sup>2+</sup>-bound structure even was slightly lower (-8.64 kcal mol<sup>-1</sup>) than in the docking with the Ca<sup>2+</sup>-free channel (-8.58 kcal mol<sup>-1</sup>; Fig. 4B), which may not be a relevant difference.

      We compared the residue interactions in both dockings (Author response table 1). S317 and Y318, which did not reduce the shift in V<sub>½</sub> upon substitution, were not predicted to contact a-Mangostin in either structure. In both structures, L312 and F315 were predicted to interact in virtually all poses analyzed. In the docking to the Ca<sup>2+</sup>-free state, also I308 was predicted to interact in 17/20 poses, while contacts to A316 occurred in 5/20 poses. In the Ca<sup>2+</sup>-bound state, predicted interactions shifted from I308 (which is expected as it is buried in the protein) to A316, and the isoprenyl moiety close to I308 rotated downwards. This could indicate that a-Mangostin adopts a more horizontal position following the upward reorientation of S6 in the Ca<sup>2+</sup>-bound state when the channel moves from one to the other conformation (Fig. S4).

      Author response table 1.

      Number of interactions of S6 residues in 20 analyzed α-Mangostin poses in the molecular dockings to the Ca2+-free and Ca2

      These docking results are consistent with our functional measurements. Recent structures of the BK/γ1 complex showed that the VSD and Ca<sup>2+</sup>-bowl are stabilized in an active-like conformation that corresponds to the conformation seen in the Ca<sup>2+</sup>-bound state (Kallure et al., 2023; Yamanouchi et al., 2023; Redhardt, Raunser and Raisch, 2024), indicating that very likely the Ca<sup>2+</sup>-bound and Ca<sup>2+</sup>-free structures indeed represent open and closed conformations of the channel. We observed that α-Mangostin can bind to both of these states to activate the channel (Fig. 3C, D), showing the presence of a binding site in both conformations. Further, α-Mangostin induced a left-shift in V<sub>½</sub> also in higher Ca<sup>2+</sup> concentration (Fig. 2D), indicating that it still binds to and activates the channel after the conformational change in S6. As we could not determine affinity for the mutants due to limited solubility, we have no information on the nature of the contribution of the substitutions, i.e., reduced binding or allosteric effect. As I308 is buried in the Ca<sup>2+</sup>-bound state, its contribution is likely mostly allosteric. We have also proposed dewetting as possible activation mechanism, which we expect to be less sensitive to the exact pose of a molecule (as shown for NS11021, Nordquist et al., 2024). Therefore, α-Mangostin could, e.g., change solvent accessibility of the I308 sidechain, energetically favoring the buried (open) state.

      We have now included both dockings and Author response table 1 in Fig. S4, and we have added passages to the results section (starting at line 373) and discussion section (starting at lines 544, 588).

      Minor:

      (1) From Figure 3A, it is apparent that the increase in Po is at the expense of the long periods (seconds) that the channel remains closed. One might suggest that α-mangostin increases the burst periods. It would be beneficial if the authors measured both closed and open dwell times to test whether α-mangostin primarily affects the burst periods.

      We thank the reviewer for this valuable suggestion, which we have implemented. In our single channel measurements shown in our original Fig. 3 we have not observed burst behavior of the BKɑ channels. This can be explained by the fact that we measured in resting condition (100 nM free Ca<sub>i</sub></sup>2+</sup>) and with rather mild depolarisation (+40 mV) where Po was very low. We have therefore analyzed measurements in 5 µM free a<sub>i</sub></sup>2+</sup> where we recorded sufficient burst activity also in the basal state.

      The burst analysis showed that ɑ-Mangostin indeed prolongs bursts and shortens the interburst closures. Within bursts, both closed times and open times were increased, and we recorded a higher number of opening events per burst. We conclude that ɑ-Mangostin acts in both the closed and the open state, where it slows open-closed transitions resulting in less flicker, and stabilizes the open state via longer open times and a higher probability for closed-open transitions.

      We now show this data in Fig. 3D-F and Table S8, and have accordingly added passages to the results section (starting at line 285), the discussion (line 510), and the methods section (starting at line 746).

      (2) In several places, the authors make similarities in the mode of action of other BK activators and α-mangostin; however, the work of Gessner et al. PNAS 2012 indicates that NS1619 and Cym04 interact with the S6/RCK linker, and Webb et al. demonstrated that GoSlo-SR-5-6 agonist activity is abolished when residues in the S4/S5 linker and in the S6C region are mutated. These findings indicate that binding of the agonist is not near the selectivity filter, as the authors' results suggest that α-mangostin binds.

      We will gladly clarify our ideas concerning the binding sites of other activators and ɑ-Mangostin. We first hypothesized that ɑ-Mangostin may share characteristics and mode of action with the class of negatively charged activators (NCA) that we have described before (Schewe et al., 2019). NCA were found to occupy a common fenestration site that is located close to the selectivity filter in TREK K2P channels, and in this manuscript we have shown by THexA competition and mutagenesis experiments that ɑ-Mangostin also binds in this fenestration region in TREK-1 channels (Fig. S3).

      The existence of this common NCA binding site was also proposed for BK channels, as a docking placed the NCA NS11021 in an equivalent binding region, and, among others, NS11021 and GoSlo-SR-5-6 competed with THexA for binding in the pore (Schewe et al., 2019). These results were indeed not fully in agreement with the proposed binding site of GoSlo-SR-5-6 in Webb et al. (2015), although the most effective (double) mutants were located at S317 and I323, at the intracellular end of the cleft between neighboring S6 segments. In this manuscript, we have shown that α-Mangostin is present in the pore of BK channels by molecular docking, a THexA competition assay, and two mutations that reduced the shift in V<sub>½</sub> induced not only by ɑ-Mangostin but also by GoSlo-SR-5-6 (Fig. 4). While the docking was rather a starting point, both functional tests argue against a binding site in the S4/5 linker/S6C region; however, allosteric mechanisms could still reduce activation also in mutants in the S4/5 linker/S6C region far from the pore binding region proposed by us in the 2019 study and the present manuscript.

      To summarize, we did not mean to imply that all BK activators should bind to this site, especially if they are not part of the NCA class (as NS1619, Cym4, as well as BC5, whose different binding site enabled us to use it as a control in our THexA competition assay). However, the cleft close to gating relevant S6 residues may well pose a region especially susceptible to modulator binding (as BL-1249, GoSlo-SR-5-6, and ɑ-Mangostin). We have moved, respectively separated, the initial GoSlo references from the reference to the pore binding site in the paragraph (lines329, 358) to improve clarity.

      (3) The sentence starting in line 452 states that there is a pronounced allosteric coupling between the voltage sensors and Ca2+ binding. If the authors are referring to the coupling factor E in the Horrigan-Aldrich gating model, the references cited, in particular, Sun and Horrigan, concluded that the coupling between those sensors is weak.

      We are grateful for the opportunity to improve this passage. We intended to express that observed effects (in this case the shift in V<sub>½</sub>) are pronounced around 1 µM Ca<sup>2+</sup>. As the reviewer states, the coupling factor between the voltage and calcium sensors (E; 2.4) is weak compared to the coupling of Ca<sup>2+</sup> (C; 8) and voltage (D; 25) to the pore in the Horrigan-Aldrich model. However, the shape of the Ca<sup>2+</sup>-dependence of V<sub>½</sub> cannot be completely described when E is neglected, with the highest difference around 1-2 µM Ca<sup>2+</sup> (Horrigan and Aldrich, 2002). Deletion of the gating ring underlines the allosteric sensor coupling (Clay, 2017). This together with the steep Ca<sup>2+</sup>-dependence in this concentration range (meaning high Po changes upon occupancy increase; Cui, Cox and Aldrich, 1997) explains the higher apparent activation, visible as the higher shift in V<sub>½</sub> observed at the 1 µM Ca<sup>2+</sup>. Speaking with the model of Sun and Horrigan (2022), the suppressing “molecular logic gate” is already relieved by the presence of intermediate Ca<sup>2+</sup>, and the direct “gating lever” pathway via voltage acts synergistically and achieves the observed higher V<sub>½</sub> shift upon depolarization. We have adapted the sentence and separated the citations for better understanding (lines 503-507).

      References:

      Clay, J.R. (2017) “Novel description of the large conductance Ca2+-modulated K+ channel current, BK, during an action potential from suprachiasmatic nucleus neurons,” Physiological Reports, 5(20), p. e13473. Available at: https://doi.org/10.14814/phy2.13473.

      Cui, J., Cox, D.H. and Aldrich, R.W. (1997) “Intrinsic Voltage Dependence and Ca2+ Regulation of mslo Large Conductance Ca-activated K+ Channels,” Journal of General Physiology, 109(5), pp. 647–673. Available at: https://doi.org/10.1085/jgp.109.5.647.

      Horrigan, F.T. and Aldrich, R.W. (2002) “Coupling between voltage sensor activation, Ca2+ binding and channel opening in large conductance (BK) potassium channels,” The Journal of General Physiology, 120(3), pp. 267–305. Available at: https://doi.org/10.1085/jgp.20028605.

      Kallure, G.S. et al. (2023) “High-resolution structures illuminate key principles underlying voltage and LRRC26 regulation of Slo1 channels.” bioRxiv, p. 2023.12.20.572542. Available at: https://doi.org/10.1101/2023.12.20.572542.

      Nordquist, E.B., Jia, Z., Chen, J., 2024. “Small Molecule NS11021 Promotes BK Channel Activation by Increasing Inner Pore Hydration.” J. Chem. Inf. Model. 64, 7616–7625. https://doi.org/10.1021/acs.jcim.4c01012

      Redhardt, M., Raunser, S. and Raisch, T. (2024) “Cryo-EM structure of the Slo1 potassium channel with the auxiliary γ1 subunit suggests a mechanism for depolarization-independent activation,” FEBS Letters, 598(8), pp. 875–888. Available at: https://doi.org/10.1002/1873-3468.14863.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Sun, L. and Horrigan, F.T. (2022) “A gating lever and molecular logic gate that couple voltage and calcium sensor activation to opening in BK potassium channels,” Science Advances, 8(50), p. eabq5772. Available at: https://doi.org/10.1126/sciadv.abq5772.

      Tao, X. and MacKinnon, R. (2019) “Molecular structures of the human Slo1 K+ channel in complex with β4,” eLife 8, p. e51409. Available at: https://doi.org/10.7554/eLife.51409.

      Webb, T.I. et al. (2015) “Molecular mechanisms underlying the effect of the novel BK channel opener GoSlo: Involvement of the S4/S5 linker and the S6 segment,” Proceedings of the National Academy of Sciences, 112(7), pp. 2064–2069. Available at: https://doi.org/10.1073/pnas.1400555112.

      Yamanouchi, D. et al. (2023) “Dual allosteric modulation of voltage and calcium sensitivities of the Slo1-LRRC channel complex,” Molecular Cell, 83(24), pp. 4555-4569.e4. Available at: https://doi.org/10.1016/j.molcel.2023.11.005.

      Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protective properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protective effects.

      Strengths:

      The authors present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio-protective effects.

      We sincerely thank the reviewer for appraising the achievements of our study.

      Weaknesses:

      The identification of the binding site is not the strongest point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produced by a-mangostin. However, these experiments do not demonstrate binding to these sites, and could be explained by allosteric effects on gating induced by the mutations themselves.

      We are aware that our functional data are unfortunately not sufficient to clearly distinguish between effects due to affinity loss or due to allosteric mechanisms. Our attempts to generate complete dose–response curves for the mutants to determine accurate apparent IC<sub>50</sub> values were unfortunately limited by the solubility of the compound. Consequently, we have avoided making claims about affinity loss in the mutant analysis, and have instead only reported the reduction in potency, expressed as the shift in V<sub>½</sub>. To reduce confounding effects from the mutations themselves, we selected substitutions that preserved the most wildtype-like GV-relationships, based on the extensive mutagenesis work of (Chen, Yan and Aldrich, 2014). We address this matter also in our answer to Recommendation (6) below, and we have replaced the word “binding” in the title of the manuscript. Nevertheless, we consider the proposed binding region to be well supported by the THexA competition experiments in combination with molecular docking, even though the specific mechanistic contributions of individual residues cannot yet be resolved.

      Reviewer #3 (Recommendations for the authors):

      (1) Natural xanthones as α-Mangostin induce vasorelaxation via binding to key gating residues in the S6 domain of BK channels.

      (2) If α-Mangostin occupies a similar binding site to quaternary ammoniums, what is the explanation for not observing a reduction in the single-channel current (fast blocking effect)? The α-Mangostin site proposed here is in a region of the channel that should occlude ion permeation. The authors should discuss possible explanations for this apparently contradictory observation.

      As the reviewer states, we indeed have not observed a reduced single channel amplitude in any measurement. The THexA competition assay showed that ɑ-Mangostin is present in the pore cavity and interferes with THexA access to its binding site. However, we do not think that their binding sites are similar, as QA ions bind directly below the filter entrance to block permeation, while our studies suggest that ɑ-Mangostin binds in the upper portion of the cleft between S6 helices. In this position, it would clearly overlap with the QA binding site and hinder access, but not block permeation. We would therefore not expect to see an amplitude reduction by intermittent α-Mangostin block. Consistently, all binding poses in our dockings were close to the cavity wall, without interfering with the central ion conduction pathway. To better illustrate this, we have added updated intracellular views of the dockings in the Ca<sup>2+</sup>-free and Ca<sup>2+</sup>-bound state (which we have also now included as suggested by another reviewer) to the supplementary information (Fig. S4A).

      (3) In Figure 2D, it is difficult to appreciate the differences between the symbols representing the G-V relationships of BKa channels at different intracellular Ca concentrations, before and after activation with 10 μM a-Mangostin. A clearer distinction between the curves would help to interpret the data more easily.

      We thank the reviewer for the suggestion to improve figure accessibility. We have changed the line appearance for better discrimination of the overlying portions.

      (4) Both THexA and TPA block BK channels through voltage and state-dependent mechanisms. Therefore, their apparent affinity could change if a-Mangostin simply increases open probability or alters dwell times rather than physically blocking access to the binding site.

      The reviewer addresses valid limitations that can affect the meaningfulness of competition experiments under certain conditions. However, we think that this does not apply to our results:

      Previous studies have shown that the voltage dependence of quaternary ammonium blockers up to C<sub>10</sub> is rather weak in BK channels, and only a slight increase in block is present in the voltage range +30 mV to +100 mV (Li and Aldrich, 2004; Thompson and Begenisich, 2012). Hence, THexA voltage dependence has already reached a plateau in the competition assay (at +40 mV), and its voltage dependence would have little effect on our results.

      Controversy exists about the nature of the state dependence of different quaternary ammonium blockers, but TBA is often recognized as an open channel blocker of BK channels, which probably also applies to THexA (Wilkens and Aldrich, 2006; Tang, Zeng and Lingle, 2009; Thompson and Begenisich, 2012; Posson, McCoy and Nimigean, 2013). Assuming such an open-channel block, apparent IC<sub>50</sub> values would be inversely proportional to Po. The THexA IC<sub>50</sub> was about 80 nM in the basal state, when Po is very low (0.024 at +40 mV as derived from the GV-relationship); an increase of open dwell times, respectively Po, in the presence of α-Mangostin to, e.g., 0.3 would therefore lead to a ≈10-fold decrease in apparent IC<sub>50</sub>. However, the apparent THexA IC<sub>50</sub> strongly increased rather than decreased (more than 20-fold to around 1.6 µM). This cannot arise from Po change and must reflect the altered access of THexA to its binding site caused by α-Mangostin. Assuming a pure closed channel block where apparent IC<sub>50</sub> would correlate with the closed times, an increase of about 1.4-fold is expected. However, we recorded a much stronger 20-fold increase. Therefore, we are convinced that we have conclusively shown that α-Mangostin is present in the BK pore irrespective of the state dependence of THexA block.

      (5) The pH dependence of the V1/2 shift supports the idea that α-Mangostin becomes more negatively charged at higher pH (enhancing its effect.) However, although the data are consistent with this interpretation, additional controls such as using a non-ionizable analog or assessing solubility changes with pH would be needed to confirm that the shift is caused specifically by ionization of α-Mangostin and not by indirect pH effects on channel gating.

      We agree with the reviewer that the pH experiment by itself is not sufficient to clearly tie the existence of a charge to a possible activation mechanism. We still think that this is an interesting observation and should be made known, as we have investigated the mechanism of negatively charged activators in different K<sup>+</sup> channel families before (Schewe et al., 2019). Unfortunately, we do not have access to uncharged derivatives mimicking the 3D conformation. From the commercially available substances, the bare xanthone backbone is completely insoluble in water. We have therefore tested the derivative 3-hydroxyxanthone as example with a minimal number of hydroxyl substituents (Author response image 2, Author response table 2 ). The 3-hydroxyxanthone indeed shows reduced activation compared to α-Mangostin. The shift in V<sub>½</sub> induced by 10 µM 3-hydroxyxanthone was only 14.99 ± 5.67 mV (≈50 mV for α-Mangostin). This supports that the presence of several (potentially) charged substituents is important for the activation mechanism. However, we have no knowledge about the efficacy of the compound or the local pK<sub>a</sub> of the different hydroxyl groups. As the reviewer stated, systematic chemical modifications would be necessary to elucidate the importance of the charged substituent number and positions, which is not within our capabilities.

      Author response image 2.

      Activation of BKα by 3-hydroxyxanthone. (A) GV-relationship before and after application of 10 µM 3-hydroxyxanthone. (B) V<sub>½</sub> before and after application of 10 µM 3-hydroxyxanthone compared to α-Mangostin and the resulting difference in V<sub>½</sub> (ΔV<sub>½</sub>). Measurements were conducted as described in the main manuscript with 100 nM free Ca<sub>i</sub><sup>2+</sup>.

      Author response table 2.

      Comparison of the V<sub>½</sub> ± SEM and ΔV<sub>½</sub> ± SEM before and after activation by 10 µM α-Mangostin or 10 µM 3-hydroxyxanthone in BKα channels. Unpaired t-test, two-tailed P values (α=0.05)

      (6) The reduced V1/2 shifts observed in the I308A, L312M, and A316PP mutants may result from intrinsic gating alterations rather than a true loss of a-Mangostin binding. The GoSlo-SR-5-6 control is informative, but the persistence of activation in A316P does not fully resolve this. A more convincing test would be employing double or triple mutants.

      As stated above, we acknowledge that our functional data do not allow us to definitively separate effects arising from a true loss of binding affinity from those due to potential allosteric effects. We tried to minimize intrinsic gating alteration brought by substitutions by not conducting a pure alanine or cysteine scanning mutagenesis. Instead, substitutions were chosen to be closest to the wildtype GV-relationship in (Chen, Yan and Aldrich, 2014) where possible. While L312M was virtually identical to the wildtype, A316P showed a change in slope in high Ca<sup>2+</sup> concentrations, which could indicate a changed voltage sensitivity. Additionally, A316P completely abolished α-mangostin activation. We therefore also used A316G to ensure that the channel is functional and retains voltage sensitivity, even if its V<sub>½</sub> was shifted stronger. As we have conducted paired measurements and assessed the V<sub>½</sub> before and after activation, we are confident that we can attribute a reduced shift to the reduced action of α-mangostin.

      Following the reviewer’s suggestion, we have generated and measured the double mutants I308A/L312M, I308A/A316G, and L312M/A316G (the triple mutant I308A/L312M/A316G did not produce measurable currents). The mutants I308A/L312M and I308A/A316G showed a moderate energy-additive effect and reduced the shift in V<sub>½</sub> by further ≈7 mV compared to the single mutation with the stronger shift. The combination L312M/A316G, however, did not further reduce the shift seen in the single mutations and did not even produce the shift induced by A316G alone.

      Author response image 3.

      Double Mutants I308A/L312M, I308A/A316G and L312M/A316G compared to the single mutations in the main manuscript. The V½ before and after activation with 10 µM α-Mangostin, the resulting shift in V½, and the GV-relationships are shown (n=6-7), measurements were made as in Fig. 4.

      Author response table 3.

      Summary of the V<sub>½</sub> before and after Mangostin activation and the resulting shifts in V<sub>½</sub> for the double mutants compared to the single mutants shown in the main manuscript.

      Following a suggestion by another reviewer, we have generated Alphafold3 (AF3) models for I308A, L312M and A316P and repeated the Mangostin docking. We learned that the mutations are all predicted to substantially impact the structure of the S6 helix, therefore altering the binding region, and A316P especially impacted the nature of residue interactions. This could be an explanation why the double mutants do not show a clear and consistent additive effect.

      Unfortunately, this outcome is not conclusive and the double mutants do not reveal further information compared to the single mutants. We have therefore decided not to include these measurements in the manuscript.

      As we do not know if our answers will be sent to all reviewers, we repeat the relevant part about the AF3 models here:

      (…) According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift). (…)

      (7) The subtraction approach used to isolate BK currents (difference before and after a-Mangostin) assumes that the compound affects only BK channels. However, a-Mangostin could also modulate Cav currents directly, as reported for other polyphenolic compounds. No vehicle (DMSO) control is shown.

      We agree with the reviewer that α-Mangostin could also modulate Ca<sub>v</sub> currents; however, this would not interfere with the conclusions drawn from this nanodomain experiment. We intended to show the overall current modulation by ɑ-Mangostin in the voltage range relevant for Ca<sub>v</sub>-BK coupling, as this would be the determinant for the membrane potential mediating the vasoactive effect. In native tissue, BK and Ca<sub>v</sub> channels (among others) would likewise contribute to the net membrane conductance, with BK channels being a major contributor when activated. In fact, a concomitant inhibition of Ca<sub>v</sub> channels could act synergistically in favor of vasodilation. This could therefore be a subject for the further investigation of potential ɑ-Mangostin targets. However, the fact that iberiotoxin prevented relaxation in aortic preparations conclusively showed that BK channels are the major player in native tissue.

      We have reformulated some sentences to prevent misunderstandings that we refer to isolated BK currents instead of α-Mangostin activated currents.

      DMSO controls were conducted and did not impact BK or Ca<sub>v</sub>1.2 currents or the aortic tissue contraction. We have added representative measurements as Fig. S6 and stated the DMSO concentration in the Methods section (line 655).

      (8) Most kinetic fits were obtained at strong depolarizations (around +100 mV), which limits how well these results can be extrapolated to physiological voltages. Although the BK-Cav experiments show facilitation between -50 and +50 mV, providing plots for activation and deactivation in that range would strengthen the physiological relevance.

      We thank the reviewer for this valuable suggestion. We now additionally show that the impact of ɑ-Mangostin on activation is high at lower depolarisation, indeed underlining its physiological relevance. To address the activation time course in a more physiological voltage range, we have used our measurements of BKɑ channels in 10 µM Ca<sub>i</sub></sup>2+</sup> (where the V<sub>½</sub> shift induced by ɑ-Mangostin is equal to 100 nM ca<sub>i</sub><sup>2+</sup>+; Fig. 2D). The outward currents already present in the lower voltage range under these conditions allowed us to fit a monoexponential function to the traces of 0 mV to 100 mV prepulses. The τ of activation decreased from 29.6 ± 3.1 ms at 0 mV to 2.4 ± 2 ms at +100 mV. After ɑ-Mangostin activation, the time course was accelerated, with a τ of activation of 9.5 ± 4.7 ms at 0 mV to 2 ± 0.6 ms at +100 mV. This faster activation was particularly effective in the lower voltage range far from high Po, e.g., ɑ-Mangostin caused a decrease of more than half of the τ of activation at +20 mV (from 12.2 ± 0.6 ms to 4.98 ± 1.6 ms).

      Our data consists of families of different prepulse voltages and a fixed repolarisation step (to -50 mV for 100 nM free Ca<sub>i</sub><sup>2+</sup>, and to -100 mV for 10 µM free Ca<sub>i</sub><sup>2+</sup>). Thus, we are not able to add plots for the voltage-dependence of deactivation in the same way as for activation. However, we can present the deactivation time constants of lower prepulse voltage steps that produce outward currents in symmetrical ion conditions with 10 µM free Ca<sub>i</sub></sup>2+</sup>. For -20 mV and +20 mV prepulse voltages, which better reflect physiological depolarisation, the deactivation time constant shows a 3-to 5-fold increase after ɑ-Mangostin activation.

      We now show the plot for the voltage dependence of activation in Fig. S2A and a bar graph for activation/ deactivation time constants at +20 mV as Fig. S2B; data are summarized in Table S5. We hope this adds to illustrating the effect of ɑ-Mangostin under physiological conditions.

      (9) Minor: In several parts of the paper, induced shifts to negative voltages are referred to "leftward shifts". It would be useful to be consistent and employ a more specific reference to negative or positive directions.

      We thank the reviewer for the careful reading and have harmonized the terminology.

      References

      Chen, X., Yan, J. and Aldrich, R.W. (2014) “BK channel opening involves side-chain reorientation of multiple deep-pore residues,” Proceedings of the National Academy of Sciences, 111(1), pp. E79–E88. Available at: https://doi.org/10.1073/pnas.1321697111.

      Li, W. and Aldrich, R.W. (2004) “Unique Inner Pore Properties of BK Channels Revealed by Quaternary Ammonium Block,” Journal of General Physiology, 124(1), pp. 43–57. Available at: https://doi.org/10.1085/jgp.200409067.

      Posson, D.J., McCoy, J.G. and Nimigean, C.M. (2013) “The voltage-dependent gate in MthK potassium channels is located at the selectivity filter,” Nature Structural & Molecular Biology, 20(2), pp. 159–166. Available at: https://doi.org/10.1038/nsmb.2473.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Tang, Q.-Y., Zeng, X.-H. and Lingle, C.J. (2009) “Closed-channel block of BK potassium channels by bbTBA requires partial activation,” The Journal of General Physiology, 134(5), pp. 409–436. Available at: https://doi.org/10.1085/jgp.200910251.

      Thompson, J. and Begenisich, T. (2012) “Selectivity filter gating in large-conductance Ca2+-activated K+ channels,” Journal of General Physiology, 139(3), pp. 235–244. Available at: https://doi.org/10.1085/jgp.201110748.

      Wilkens, C.M. and Aldrich, R.W. (2006) “State-independent block of BK channels by an intracellular quaternary ammonium.,” The Journal of General Physiology, 128(3), pp. 347–364. Available at: https://doi.org/10.1085/jgp.200609579.

    1. eLife Assessment

      This study resolves a cryo-EM structure of the GPCR, human GPR30, which responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. Understanding the ligand and the mechanism of activation is important to the field of receptor signaling and potentially facilitates drug development targeting this receptor. Structures and functional assays provide solid evidence for a potential bicarbonate binding site.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers.]

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, in the presence of bicarbonate, which the author's lab recently identified as the physiological ligand. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. This solid study provides important insight into the overall structure and suggests a possible bicarbonate binding site.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling are solid. Based on the structure, the authors identify a binding pocket that might accommodate bicarbonate. Although assignment of the binding pocket is speculative, extensive mutagenesis of residues in this pocket identifies several that are important to G-protein signaling. The structure shows some conformational differences with a previous structure of this protein determined in the absence of bicarbonate (PMC11217264). To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study an important contribution to the field. However, the current study provides novel and important circumstantial evidence for the bicarbonate binding site based on mutagenesis and functional assays.

      Weaknesses:

      Bicarbonate is a challenging ligand for structural and biochemical studies, and because of experimental limitations, this study does not elucidate the exact binding site. Higher resolution structures would be required for structural identification of bicarbonate. The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. However, biochemical binding assays are challenging because the binding constant is weak, in the mM range.

      The authors appropriately acknowledge the limitations of these experimental approaches, and they build a solid circumstantial case for the bicarbonate binding pocket based on extensive mutagenesis and functional analysis. However, the study does fall short of establishing the bicarbonate binding site.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work on Nature Communications (PMID: 38413581). In the current body of work, they solved the cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.15 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 3 extracellular pockets created by ECLs (Pockets A-C). Based on the polarity, location, size, and charge of each pocket, the authors hypothesized that pocket A is a good candidate for the bicarbonate binding site. To identify the bicarbonate binding site, the authors performed an exhaustive mutant analysis of the hydrophilic residues in Pocket A and analyzed receptor reactivity via calcium assay. In addition, the human GPR30-G-protein complex model also enabled the authors to elucidate the G-protein coupling mechanism of this special class A GPCR, which plays a crucial role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communications publication, the authors used cryo-EM coupled with mutagenesis and functional studies to elucidate bicarbonate-GPR30 interaction. This work provided atomic-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 3 extracellular pockets created by ECLs (Pockets A-C). The authors were able to filter out 2 of them and hypothesized that pocket A was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they mapped out amino acids that are critical for receptor reactivity.

      Weaknesses:

      When we see a reduction of a GPCR-mediated downstream signaling, several factors could potentially contribute to this observation: 1) a reduced total expression of this receptor due to the mutation (transcription and translation issue); 2) a reduced surface expression of this receptor due to the mutation (trafficking issue); and 3) a dysfunctional receptor that doesn't signal due to the mutation.

      Altogether, the wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

    4. Reviewer #3 (Public review):

      Summary

      GPR30 responds to bicarbonate and plays a role in regulating cellular pH and ion homeostasis. However, the molecular basis of bicarbonate recognition by GPR30 remains unresolved. This study reports the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate, revealing mechanistic insights into its G-protein coupling. Nonetheless, the study does not identify the bicarbonate-binding site within GPR30.

      Strengths

      The work provides strong structural evidence clarifying how GPR30 engages and couples with Gq.

      Weaknesses

      Several GPR30 mutants exhibited diminished responses to bicarbonate, but their expression levels were also reduced. As a result, the mechanism by which GPR30 recognizes bicarbonate remains uncertain.

    5. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, in the presence of bicarbonate, which the author's lab recently identified as the physiological ligand. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. This solid study provides important insight into the overall structure and suggests a possible bicarbonate binding site.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling are solid. Based on the structure, the authors identify a binding pocket that might accommodate bicarbonate. Although assignment of the binding pocket is speculative, extensive mutagenesis of residues in this pocket identifies several that are important to G-protein signaling. The structure shows some conformational differences with a previous structure of this protein determined in the absence of bicarbonate (PMC11217264). To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study an important contribution to the field. However, the current study provides novel and important circumstantial evidence for the bicarbonate binding site based on mutagenesis and functional assays.

      Weaknesses:

      Bicarbonate is a challenging ligand for structural and biochemical studies, and because of experimental limitations, this study does not elucidate the exact binding site. Higher resolution structures would be required for structural identification of bicarbonate. The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. However, biochemical binding assays are challenging because the binding constant is weak, in the mM range.

      The authors appropriately acknowledge the limitations of these experimental approaches, and they build a solid circumstantial case for the bicarbonate binding pocket based on extensive mutagenesis and functional analysis. However, the study does fall short of establishing the bicarbonate binding site.

      We thank the reviewer for this thoughtful and constructive assessment of our revised manuscript. We are grateful for the recognition of the overall quality of the cryo-EM structure and the proposed mechanism of G-protein coupling, as well as for highlighting the importance of identifying bicarbonate as a physiological ligand for GPR30 and the contribution this work makes to the receptor signaling field. We also appreciate the reviewer’s careful and balanced discussion of the inherent challenges posed by bicarbonate as a low-affinity, small, negatively charged ligand, and we fully agree that, given current experimental limitations, our data provide circumstantial—rather than definitive—evidence for the binding site and that higher-resolution structures would be required for direct visualization. Importantly, we value the reviewer’s acknowledgement that we transparently describe these limitations and that our extensive mutagenesis and functional analyses nonetheless build a solid case for the proposed bicarbonate-binding pocket, which we believe will serve as a useful framework for future biochemical and structural investigation

      Reviewer #1 (Recommendations for the authors):

      Overall, the authors do a good job responding to the previous review, with updated structures and experimental data. I have two comments on the current version:

      (1) When the authors compare their structure to a previously published structure of the same receptor, they say that the previous structure came out while the current manuscript was in revision (line 255). This is not correct. The previous manuscript was published May 14, 2024, and the current manuscript was received by eLife on May 20, 2024. This sentence should be corrected to "During the preparation of this manuscript..."

      We corrected the sentence accordingly (line 259).

      (2) Line 173: what other structures are the authors referring to? Citations should be included here.

      Is Line 193 correct? We added citations (line 190).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work on Nature Communications (PMID: 38413581). In the current body of work, they solved the cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.15 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 3 extracellular pockets created by ECLs (Pockets A-C). Based on the polarity, location, size, and charge of each pocket, the authors hypothesized that pocket A is a good candidate for the bicarbonate binding site. To identify the bicarbonate binding site, the authors performed an exhaustive mutant analysis of the hydrophilic residues in Pocket A and analyzed receptor reactivity via calcium assay. In addition, the human GPR30-G-protein complex model also enabled the authors to elucidate the G-protein coupling mechanism of this special class A GPCR, which plays a crucial role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communications publication, the authors used cryo-EM coupled with mutagenesis and functional studies to elucidate bicarbonate-GPR30 interaction. This work provided atomic-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 3 extracellular pockets created by ECLs (Pockets A-C). The authors were able to filter out 2 of them and hypothesized that pocket A was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they mapped out amino acids that are critical for receptor reactivity.

      Weaknesses:

      When we see a reduction of a GPCR-mediated downstream signaling, several factors could potentially contribute to this observation: 1) a reduced total expression of this receptor due to the mutation (transcription and translation issue); 2) a reduced surface expression of this receptor due to the mutation (trafficking issue); and 3) a dysfunctional receptor that doesn't signal due to the mutation. In the current revision, based on the gating strategy, the surface expression of the HA-positive WT GPR30-expressing cells is only 10.6% of the total population, while the surface expression levels of the mutants range from 1.89% (P71A) to 64.4% (D111A). Combining this information with the functional readout in Figure 3F and G, as well as their previous work, the authors concluded that mutations at P71, E115, D125, Q138, C207, D210, and H307 would decrease bicarbonate responses. Among those sites,

      E115, Q138, and H307 were from their previous Nature Comm paper.

      Authors claim P71 and C207 make a structural-stability contribution, as their mutations result in a significant reduction in surface expression: P71A (1.89%) and C207A (2.71%). However, compared to 10.6% of the total population in the WT, (P71A is 17.8% of the WT, and C207A is 25.6% of the WT), this doesn't rule out the possibility that the mutated receptor is also dysfunctional: at 10 mM NaHCO3, RFU of WT is ~500, RFU of P71 and C207 are ~0.

      The authors also interpret "The D125ECL1A mutant has lost its activity but is located on the surface" and only mention "D125 is unlikely to be a bicarbonate binding site, and the mutational effect could be explained due to the decreased surface expression". Again, compared to 10.6% of the total population in the WT, D125A (3.94%) is 37.2% of the WT. At 10 mM NaHCO3, the RFU of the WT is ~500, the RFU of D125 is ~0. This doesn't rule out the possibility that the mutated receptor is also dysfunctional. It is not clear why D125A didn't make it to the surface.

      Other mutants that the authors didn't mention much in their text: D111A (64.4%, 607.5% of WT surface expression), E121A (50.4%, 475.5% of WT surface expression), R122 (41.0%, 386.8% of WT surface expression), N276A (38.9%, 367.0% of WT surface expression) and E218A (24.6%, 232.1% of WT surface expression) all have similar RFU as WT, although the surface expression is about 2-6 times more. On the other hand, Q215A (3.18%, 30% of WT surface expression) has similar RFU as WT, with only a third of the receptor on the surface.

      Altogether, the wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We sincerely thank the reviewer for their careful reading and thoughtful evaluation of our manuscript on the cryo-EM structure of the bicarbonate receptor GPR30. We greatly appreciate the reviewer’s positive assessment of the overall significance of combining structural determination with extensive mutagenesis and functional assays to advance understanding of bicarbonate–GPR30 interactions and G-protein coupling, as well as their recognition that these atomic-level insights will be valuable for future mechanistic studies and drug-development efforts. We are also grateful for the reviewer’s constructive critique regarding the interpretation of reduced signaling in the context of variable surface expression across mutants, which highlights an important point about disentangling effects of expression/trafficking from intrinsic receptor dysfunction; these comments are highly insightful and will help us strengthen the clarity and rigor of our presentation and conclusions in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      In this revision, the authors have made a significant effort to improve and validate the structural observations, as well as address the comments in the previous submission. They updated the functional assays and evaluated the receptor function by measuring intracellular calcium mobilization, which is a more direct measurement for the downstream signaling of hGPR30-Gq signaling. They also used flow cytometry with an HA-antibody for a more direct measurement of the surface expression of the receptor, replacing their previous assay that normalized to the housekeeping gene Na-K-ATPase.

      I appreciate the effort the authors made to address the previous comments made by the reviewers. However, there are still some concerns about the current data.

      (1) The authors have addressed my previous comment on untangling the mixture of their previous and new data in the "insights into bicarbonate binding" section. They have made it clear that the importance of E115, Q138, and H307 in the receptor-bicarbonate interaction was shown in their Nature Communications paper.

      (2) The authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, or referring more to their previous work about the rationale to select the bicarbonate dose in their functional assay.

      (3) The authors have updated Figure 3

      (4) The authors have updated Supplemental Figure 1 to show the full gel with molecular weight markers in the supplemental data to demonstrate the sample purity.

      (5) The authors have updated the predicted model using AF3

      (6) The authors added E218A as suggested before.

      Some new suggestions for this R1:

      (1) The wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We acknowledge this limitation. The wide range of surface expression among cell lines, together with differences in assay modalities, may introduce variability that complicates direct quantitative comparisons and therefore only partially supports the structural observations. Future work using more standardized expression systems and matched functional readouts will be important to strengthen the structure–function linkage.

      (2) Line 101, "ICL1 and ECL1 contain short α helices", no α helix of ICL1 is shown in Figure 2C

      We removed the word “ICL1” (line 98).

      (3) For the unsolved region of ECL2, could the author put a dashed line connecting ECL2 with TM4? In the current Figure 2B, it looks like ECL2 connects TM3 and TM5.

      According to the suggestion, we corrected Figure 2B.

      (4) I appreciate that the authors updated the predicted model with AF3, but they didn't make it clear why they had the comparison between their cryo-EM structure (bicarbonate-activated G-protein-incorporated GPR30) and the predicted AF3 model (inactive GPR30)

      We wish to assert the usefulness of experimental structures, not merely predictions. These include structures independent of receptor activation, such as SS bonds.

      (5) I appreciate that the authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, but it was still not clear to me why they picked 11 mM in Figure 3G for the bar graph. Also, since a dose-response curve was made in Figure 3F, why not just calculate and report the EC50 of NaHCO3 for each mutant?

      Thank you for your comment. Thank you for the comment. We’ve calculated the EC50 of the calcium response and assessed its correlation with receptors’ cell surface expression. We chose 11 mM in Fig .3G since our previous paper in Nature Communications showed the EC50 value of IPs assay was around 11 mM. However, the calcium response was more sensitive and gave a lower value than expected. Therefore, according to your advice, we deleted the bar graph with 11 mM responses, calculated EC50, and drew pictures of the correlation among cell surface expression, EC50, and maximum responses (Figure 3F-I, Supplementary File 1). Moreover, we revised the explanation about this mutagenesis study (lines139-154 and 217-230).

      (6) In the previous submission and comments, E218 was in close contact with bicarbonate in the previous Figure 4D (the bicarbonate is deleted in the new structure). I thank the authors for making an E218A mutant and performing the functional assay. As mentioned above, E218A (24.6%, 232.1% of WT surface expression) has a similar functional readout as WT. Doesn't this also indicate that E218A is partially broken, so you will need twice as much as WT to have the same downstream signal?

      Thank you for your comment. In our revised manuscript, we described the correlation between cell surface expression and EC50 and found that cell surface expression and the response to bicarbonate are not correlated, which you mentioned in your review comment (Figure 3F-I, Supplementary File 1). There are many possibilities that could explain this: GPR30 localization in specific spots on the plasma membrane might limit the response stoichiometry, GPR30 might also work intracellularly to blunt the increased response because of more GPR30 expression on PM, redundant GPR30 on PM might be broken, or E118A might be less functional and need twice as much as WT. We will examine cell surface expression of GPR30 and its response to bicarbonate in a future study.

      I would suggest that the authors in future studies consider using the Tet-on inducible cell lines, such as HEK293 Flp-In Trex. These cell lines will allow the authors to fine-tune the surface expression of their mutants to the same level with different doses of Tetracycline in their stable cell lines.

      We appreciate your advice. We’ll introduce Tet-on inducible cell lines for future research.

      Reviewer #3 (Public review):

      Summary

      GPR30 responds to bicarbonate and plays a role in regulating cellular pH and ion homeostasis. However, the molecular basis of bicarbonate recognition by GPR30 remains unresolved. This study reports the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate, revealing mechanistic insights into its G-protein coupling. Nonetheless, the study does not identify the bicarbonate-binding site within GPR30.

      Strengths

      The work provides strong structural evidence clarifying how GPR30 engages and couples with Gq.

      Weaknesses

      Several GPR30 mutants exhibited diminished responses to bicarbonate, but their expression levels were also reduced. As a result, the mechanism by which GPR30 recognizes bicarbonate remains uncertain, leaving this aspect of the study incomplete.

      We sincerely thank the reviewer for this thoughtful and balanced assessment of our manuscript, including the clear summary of the central advance and the constructive identification of remaining limitations. We particularly appreciate the recognition that our cryo-EM analysis provides strong structural evidence for how GPR30 engages and couples with Gq, and we agree that pinpointing the bicarbonate-binding site remains a critical open question. In the revised manuscript, we will make this point more explicit, clarify the interpretation of the mutagenesis results in light of reduced receptor expression for some variants, and further strengthen the presentation and discussion of what our current data do—and do not—allow us to conclude regarding bicarbonate recognition by GPR30

      Reviewer #3 (Recommendations for the authors):

      The authors have removed the bicarbonate assignment from their model and have addressed all of my concerns. In this study, or in future work, it would be advisable for the authors to explore the use of bicarbonate mimetics with higher binding affinity to facilitate more definitive structural characterization.

      Thank you for this constructive suggestion. We agree that exploring bicarbonate mimetics with higher binding affinity would be an important next step to enable more definitive structural characterization of GPR30 and to strengthen mechanistic conclusions. In future work, we plan to pursue the identification and/or design of such mimetics, guided by the architecture and mutational landscape of the extracellular pocket described here, and to combine these ligands with optimized cryo-EM sample preparation and complementary functional assays to better stabilize and visualize the bound state.

    1. eLife Assessment

      This study introduces a valuable toolkit for zebrafish transgenesis, significantly enhancing the flexibility and efficiency of transgene generation for immunological applications. The authors provide convincing evidence through well-designed experiments, demonstrating the toolkit's utility in generating diverse and functional transgenic lines.

    2. Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Even small DNA fragements, such as the viral 2a sequence, can be cloned into a multi-component plasmid in one step. The components can be assembled from PCR fragments or synthesized DNA fragments, forgoing the need for "entry" vectors. Further, the authors show that the exisiting PaqCI sites can be domesticated to improve the versatility of the system. The validation provided in the manuscript is Convincing, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      Comments on revisions:

      The authors have addressed all the concerns raised in the first review. Congratulations to the authors for their effort.

    3. Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system, ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base-pair overhang sequence in the final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and a ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      The generation of several transgenic zebrafish lines for immunological studies demonstrates the feasibility of the ImPaqT in vivo. Lineage tracing of macrophages via LPS injection demonstrates the approach's functionality and validates its use in vivo.

      Comments on revisions:

      The authors have addressed all my concerns.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful reading of our manuscript and thoughtful comments on it. We appreciate the overall positive opinion on our manuscript and helpful comments and suggestions from the reviewers. Overall, the main points identified by reviewers were 1) further broadening of the system to a range of inputs as well as the construct types that can be generated with the system and 2) Further consideration of any off-target joining or off-target effects on genes/proteins and the limits to the expandability of the kit. To address these concerns, we have added new data in Figure 6, illustrating the generation of a new construct using PCR and dsDNA fragments, new constructs for mpeg1.1 and for CRISPR gRNA expression and have revised the text to further address concerns and limitations of the toolkit. We thank the reviewers and editors for these suggestions and feel that they have substantially improved the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Furthermore, the modular design ensures expandability, enabling researchers to customize constructs for diverse experimental designs. The validation provided in the manuscript is solid, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      We appreciate the overall positive evaluation of our toolkit and the time and effort in evaluating it.

      Weaknesses:

      While the toolkit's technical capabilities are well-demonstrated, there are several areas where additional validation and examples could enhance its impact. One limitation is the lack of data showing whether the toolkit can be directly used for rapid cloning and testing of enhancers or promoters, particularly cloning them directly from PCR using PaqCI overhangs without needing an entry vector. Similarly, the feasibility of cloning genes directly from PCR products into the system is not demonstrated, which would significantly increase the utility for researchers working with genomic elements.

      This is an excellent point. Given the increased use of gene synthesis and dsDNA fragments, we also thought it was good to demonstrate incorporation of these as well. We have added a new figure, Figure 6, which demonstrates generation of two new transgene constructs constructed by direct cloning of three PCR products along with a synthetic dsDNA fragment into a Tol2 flanked backbone plasmid as an alternative, rapid approach to generation of transgenes. The resulting plasmids, encoding the mpeg1.1. promoter, a separate p2a, and a tdTomato fluorescent protein along with either wildtype or dominant negative rac2 were properly assembled and in transient transgenic zebrafish injected with these constructs, dominant negative rac2 prevented macrophage recruitment to tail wounds, indicating that this approach worked for the generation of functional transgenes. These results are discussed in new text (lines 304-391) describing this new experiment and the finding that both PCR products and synthesized dsDNA could be efficiently incorporated in constructions generated with our approach as well as in the discussion (lines 494-499).

      The authors discuss potential applications such as using the toolkit for tissue-specific knockout applications by assembling CRISPR/Cas9 gRNA constructs. However, they do not demonstrate the cloning of short fragments, such as gRNA sequences downstream of a U6 promoter, which would be an important proof-of-concept to validate these applications. Furthermore, while the manuscript focuses on macrophage-specific promoters, the widely used mpeg1.1 promoter is not included or tested, which limits the toolkit's appeal for researchers studying macrophages and microglia.

      Yes, in the new figure described above, we have now shown that this method works with shorter PCR fragments such as the p2a fragment cloned within the tdTomato-p2a-rac2 constructs described above. This fragment is ~70 bp and while this is somewhat longer than a simple gRNA targeting sequence (though smaller than a complete sgRNA), we believe that this indicates that smaller size fragments can still be incorporated within these constructs. We also agree with the general idea of increasing functionality to incorporate CRISPR/Cas9 and now include a 3E encoding the zebrafish U6 promoter. As CRISPR expression constructs frequently incorporate complex construction, for instance, expression of tagged Cas9 along with the U6 driven gRNA as in Zhou et al., 2018 or along with rescue constructs as in Wang et al., 2021, we have given these constructs the non-standard 5’ end O3c, to enable multiplexing in these complex constructs.

      We agree that it is important to include mpeg1.1, given the broad use of this promoter within the field, we’ve now included an 5E mpeg1.1 construct within the toolkit.

      Another potential limitation is the handling of sequences containing PaqCI recognition sites. Although the authors discuss domestication to remove these sites, a demonstration of cloning strategies for such cases or alternative methods to address these challenges would provide practical guidance for users.

      Absolutely, we have now included a new figure (Supplementary Figure 6) that illustrates one domestication approach using PCR and homology-based cloning as an easy approach to domestication. In addition, we have also mentioned alternative approaches for domestication in the discussion (lines 439-444).

      Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base pair overhang sequence in their final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      Strengths:

      The generation of several lines of transgenic zebrafish for the immunologic study demonstrates the feasibility of the ImPaqT in vivo. The lineage tracing of macrophages by LPS injection shows this approach's functionality, validating its usage in vivo.

      We appreciate the positive sentiments for our toolkit and the effort put into reviewing our manuscript.

      Weaknesses:

      (1) There is no quantitative data analysis showing the percentage of off-target based on these 4bp overhang sequences.

      While we agree that this is an important variable for the method, we feel that previous studies that have broadly tested off-target effects of all potential 4 bp overhang sequences have already given an effective overview of interactions between each of these overhangs (Potapov et al., 2018; Pryor et al., 2020). The results from these studies were incorporated into the NEB ligase fidelity viewer that we used to predict the overhangs that would have minimal off-target with each other: the tool also reports the expected off-target ligation of individual 4 bp overhangs. In all cases, we selected overhangs that would have minimal off-target efficiency, with each of the overhangs showing 1% or less off-target ligation with any of the other overhangs chosen. We have added new text, lines 119-124, that further clarifies that our selection for these ends.

      (2) There is no statement for the upper limitation of the expandability.

      Yes, we’ve been curious as well. While our cloning of 6 distinct fragments in Figure 5 and a new 5 fragment cloning added in revision seen in Figure 6, suggests that 5-6 fragments can be readily assembled, in the course of revisions we also attempted to generate a larger product of 11 fragments that ultimately failed. While the 11 fragment construct was unsuccessful, it is unclear whether this is due to the constructs chosen, the potential size of the plasmid or due to a failure of the technique/enzymes themselves. Given that published descriptions of PaqCI Golden Gate cloning approaches have found that PaqCI can assemble at least 32 fragments and can produce large sequences (e.g. in Sikkema et al., 2023, where they assemble the ~40 kbp T7 genome from 12, 24 and 32 distinct fragments using a PaqCI Golden Gate reaction), we suspect that our issues with the 11 fragment assembly are likely due to complications with the specific group of constructs that were combined, however, we have not been able to exhaustively test a range of constructs and assemblies of varying complexity levels. To recognize this, we have added additional text (lines 490-493) to the discussion describing that we have only combined 6 constructs, but that we think that this likely encompasses many of the applications that may be needed for this system, while recognizing that expansion beyond this number may be possible.

      (3) There is no data about any potential side effect on their endogenous function of promoter/protein of interest with the ImPaqT method.

      Absolutely, we have added new text (lines 457-470) to our discussion describing the potential side effects on protein function. For instance, the need to be aware of whether N- or C-termini of proteins can be modified and recognition of the potential for affecting/creating ectopic transcription factor binding sites as potential pitfalls to keep in mind.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The data presented in the manuscript is robust and well-supported. However, to fully demonstrate the broad applicability of the toolkit and strengthen its impact, a few additional experiments could be beneficial. Specific suggestions for these experiments and areas of improvement are outlined in the 'Weaknesses' section of the Public Review. Additionally, Figures 2-4 illustrate the same concept - cloning three fragments from entry vectors-which comes across as repetitive. Incorporating a more diverse range of use cases would better highlight the versatility of the toolkit.

      As we described in our replies to your public points above, we have now added new Figure 6 and new Supplementary Figure 6 addressing the cloning of PCR fragments, short fragments as well as a mechanism of domestication. We have also included the mpeg1.1 promoter within the toolkit. In addition, your point on the repetition of assay is fair and in our new Figure 6, we instead used wild type and dominant-negative Rac2 expression and failure of macrophage recruitment to the tail wound.

      Reviewer #2 (Recommendations for the authors):

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, it is interesting and potentially efficient, but I have a few concerns:

      (1) The author claimed that the ImPaqT system is more efficient than other existing systems. The authors should provide such data to support their claim.

      Our argument wouldn’t be that the ImPaqT system is strictly speaking more efficient, but rather that the combination of minimal added sequence, the ability to expand or contract the fragments used, and, in our new Figure 6, the ability to directly utilize PCR products and dsDNA fragments, while retaining the ability to combinatorially build constructs from a suite of existing sequences is the main point of the method. We now explicitly state that Golden Gate cloning isn’t more efficient than existing techniques in the text (lines 534-537), but rather the particular strength of the method is the flexibility and minimal added sequence.

      (2) The ImPaqT is theoretically less prone to have off-target effects than existing systems, the authors should provide such data to validate their claim.

      Good point, we have now searched the zebrafish genome for PaqCI sites as well as for BsaI and BsmBI which are the 6-base cutters most commonly used for Golden Gate cloning. We found that PaqCI cuts every ~17 kb in the zebrafish genome while BsaI and BsmBI cut every ~9 kb or ~13 kb respectively, further supporting that PaqCI sites are rarer in the genome and should generally require domestication less often. We have now added new text describing this in lines 129-132.

      (3) The authors should mention any potential side effects of this system on the endogenous function of the promoter/protein of interest, at least in their discussion part.

      Yes, this should absolutely be expanded, as we said in your public comments above, we have now added new text describing potential pitfalls that this method may have on promoter or gene expression.

      (4) The authors are suggested to provide a balanced discussion about the expandable usage of this system beyond the immune system.

      We agree, this is also a good point that we should have emphasized more. We’ve added new text (lines 537-541) recognizing that in principle, many of the components we’ve derived should be useful in non-immune systems, but we also recognize that adapting this to new tissues will require the development of new promoters within the Golden Gate system which can be combined with these already developed tools.

      References

      Potapov, V., Ong, J.L., Kucera, R.B., Langhorst, B.W., Bilotti, K., Pryor, J.M., Cantor, E.J., Canton, B., Knight, T.F., Evans, T.C., Jr., et al. (2018). Comprehensive Profiling of Four Base Overhang Ligation Fidelity by T4 DNA Ligase and Application to DNA Assembly. ACS Synth Biol 7, 2665-2674.

      Pryor, J.M., Potapov, V., Kucera, R.B., Bilotti, K., Cantor, E.J., and Lohman, G.J.S. (2020). Enabling one-pot Golden Gate assemblies of unprecedented complexity using data-optimized assembly design. PLoS One 15, e0238592.

      Sikkema, A.P., Tabatabaei, S.K., Lee, Y.J., Lund, S., and Lohman, G.J.S. (2023). High-Complexity One-Pot Golden Gate Assembly. Curr Protoc 3, e882.

      Wang, Y., Hsu, A.Y., Walton, E.M., Park, S.J., Syahirah, R., Wang, T., Zhou, W., Ding, C., Lemke, A.P., Zhang, G., et al. (2021). A robust and flexible CRISPR/Cas9-based system for neutrophilspecific gene inactivation in zebrafish. J Cell Sci 134.

      Zhou, W., Cao, L., Jeffries, J., Zhu, X., Staiger, C.J., and Deng, Q. (2018). Neutrophil-specific knockout demonstrates a role for mitochondria in regulating neutrophil motility in zebrafish. Dis Model Mech 11.

    1. eLife Assessment

      This study maps the genotype-phenotype landscapes of three E. coli transcription factors and the topographical features of these landscapes. It shows that ruggedness and epistasis do not hinder the evolution of strong transcription factor binding sites. These convincing findings contribute important insights into fitness landscape theories and highlight the role of chance, contingency, and evolutionary biases in gene regulation. The authors then study the topographical features of these landscapes, especially the number and distribution of local maxima, as well as the statistical properties of evolutionary paths on these landscapes.

    2. Reviewer #1 (Public review):

      Summary:

      For each of three key transcription factor (TF) proteins in E. coli, the authors generate a large library of TF binding site (TFBS) sequences on plasmids, such that each TFBS is coupled to the expression of a fluorescence reporter. By sorting the fluorescence of individual cells and sequencing their plasmids to identify each cell's TFBS sequence (sort-seq), they are able to map the landscape of these TFBSs to the gene expression level they regulate. The authors then study the topographical features of these landscapes, especially the number and distribution of local maxima, as well as the statistical properties of evolutionary paths on these landscapes. They find the landscapes to be highly rugged, with about as many local peaks as a random landscape would have, and with those peaks distributed approximately randomly in sequence space. This is quite different from previous work on landscapes for eukaryotic TFBSs, which tend to be rather smooth. The authors find that there are a number of peaks that produce regulation stronger than that of the wild-type sequence for each TF, and that it is not too unlikely to reach one of those "high peaks" from a random starting sequence. Nevertheless, the basins of attraction for different peaks have significant overlap, which means that chance plays a major role in determining which peak a population will evolve to.

      Strengths:

      (1) The apparent differences in landscape topography between prokaryotic TFBSs and other molecular landscapes is a fascinating discovery to add to the field of genotype-phenotype maps. I am really excited to learn the molecular mechanisms of this in the future.

      (2) The experiments and analysis of this paper are very well-executed and, by and large, very thorough. I appreciated the systematic nature of the project, both the large-scale experiments done on three TFs with replicates, and the systematic analysis of the resulting landscapes. This not only makes the paper easy to follow, but also inspires confidence in their results since there is so much data and so many different ways of analyzing it. It's a great recipe for other studies of genotype-phenotype landscapes to follow.

      (3) Considering how technical the project was, I am really impressed at how easy to read I found the paper, and the authors deserve a lot of credit for making it so. They do a great job of building up the experiments and analyses step-by-step, and explaining enough of the basics of the experimental design and essence of each analysis in the main text without getting too complicated with details that can be left to the Methods or SI.

      Weaknesses:

      (1) Regarding the effect of measurement uncertainties, one way in which they attempt to test their effect is to simulate dynamics on noisy and noise-free versions of the landscape and measure visitation frequencies. While they show that visitation frequencies are highly correlated between these cases, I'd prefer a more direct test of epistasis or navigability (e..g, number of local peaks), since that's how they are characterizing the landscapes, and the connection between that and visitation frequency of individual states is unclear.

      (2) I am still a little concerned about the fraction of sequences missing from the data due to filtering, although I appreciate the difficulties in testing the importance of this (requiring additional assumptions) and the authors' good-faith efforts to do their best with the data they have.

    3. Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:<br /> The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:<br /> Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:<br /> The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:<br /> Approximately 40% of the theoretical TFBS genotype space remains uncharacterized after quality filtering. The authors now discuss this limitation more explicitly and provide analyses suggesting that undersampling does not strongly bias their conclusions at the landscape level. Nevertheless, predictive modeling approaches could further extend these landscapes in future work.

      Simplified Regulatory Architecture:<br /> The study considers a minimal system consisting of a single TFBS upstream of a reporter gene. While this simplification allows clean interpretation and high-throughput measurement, natural promoters often involve combinatorial regulation and chromosomal context effects that may alter landscape topography.

      Lack of Experimental Evolution Validation:<br /> The evolutionary conclusions are based on simulations rather than direct experimental evolution. The authors provide a reasonable justification for this choice and frame their conclusions at the statistical level rather than for specific trajectories, but experimental validation would be a valuable future extension.

      Impact on the Field:<br /> This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Data:<br /> The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:<br /> The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more complex biologically and more influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:<br /> The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The main weakness of this paper, in my view, is that it felt disconnected from the larger body of work on fitness and genotype-phenotype landscapes, including previous data on TFBSs in E. coli, genotype-phenotype maps of TFBSs in other systems, protein sequence landscapes (e.g., from mutational scans or combinatorially-complete libraries), and fitness landscapes of genomic mutations (e.g., combinatorially-complete landscapes of antibiotic resistance alleles). I have no doubt the authors are experts in this literature, and they probably cite most of it already given the enormous number of references. But they don't systematically introduce and summarize what was already known from all that work, and how their present study builds on it, in the Abstract and Introduction, which left me wondering for most of the paper why this project was necessary. Eventually, the authors do address most of these points, but not until the end, in the Discussion. Readers who have no familiarity with this literature might read this paper thinking that it's the first paper ever to study topography and evolutionary paths on genotype-phenotype landscapes, which is not true.

      There were two points that made this especially confusing for me. First, in order to choose which nucleotides in the binding sites to vary, the authors invoke existing data on the diversity of these sequences (position-weight matrices from RegulonDB). But since those PWMs can imply a genotype-phenotype map themselves, an obvious question I think the authors needed to have answered right away in the Introduction is why it is insufficient for their question. They only make a brief remark much later in the Results that the PWM data is just observed sequence diversity and doesn't directly reflect the regulation strength of every possible TFBS sequence. But that is too subtle in my opinion, and such a critical motivation for their study that it should be a major point in the Introduction.

      The second point where the lack of motivation in the Introduction created confusion for me was that they report enormous levels of sign epistasis in their data, to the point where these landscapes look like random uncorrelated landscapes. That was really surprising to me since it contrasts with other empirical landscape data I'm familiar with. It was only in the Discussion that I found some significant explanation of this - namely that this could be a difference between prokaryotic TFBSs, as this paper studies, and the eukaryotic TFBSs that have been the focus of many (almost all?) previous work. If that is in fact the case - that almost all previous studies have focused on eukaryotic TFBSs or other kinds of landscapes, and this is the first to do a systematic test of prokaryotic TFBS, then that should be a clear point made in the Abstract and Introduction. (I find a comparable statement only in the very last paragraph of the Discussion.) If that's the case, then I would also find that point to be a much stronger, more specific conclusion of this paper to emphasize than the more general result of observing epistasis and contingency (as is currently emphasized in the Abstract), which has been discussed in tons of other papers. This raises all sorts of exciting questions for future studies - why do the landscapes of prokaryotic TFBSs differ so dramatically from almost all the other landscapes we've observed in biology? What does that mean for the evolutionary dynamics of these different systems?

      We thank the reviewer for this thoughtful and detailed critique. We agree that the original version of the manuscript did not sufficiently motivate the study early on, nor did it clearly position our work within the broader literature on genotype–phenotype (GP) and fitness landscapes. We also agree that two specific issues, the role of PWMs and the unexpectedly high levels of sign epistasis, were insufficiently explained early on, which could lead to confusion for readers not already familiar with this field.

      Positioning within the broader landscape literature

      In response, we have substantially revised the Abstract and Introduction to explicitly situate our work within existing empirical studies of GP and fitness landscapes, including TFBS landscapes in bacteria, eukaryotic TFBS genotype–phenotype maps, in vitro TF–DNA binding studies, deep mutational scans of proteins, and combinatorially complete fitness landscapes such as antibiotic resistance alleles (Abstract; Introduction, lines 64–85). We now make clear that our study builds directly on this extensive body of work, rather than introducing the landscape framework itself. For example, we write in the introduction:

      “Over the last decade, genotype–phenotype (GP) maps and fitness landscapes have become central tools for understanding how molecular systems evolve under mutation and selection[22–25]. Such maps and landscapes have been experimentally studied for DNA[6,8,18,19,26,27], protein[28–32] and RNA[33–35] molecules, revealing key topographical properties that shape evolutionary outcomes, including epistasis[24,36]—the non-additive effects of multiple mutations on phenotype—landscape ruggedness, reflected in the number and distribution of fitness peaks, and constraints on adaptive evolution.”

      At the same time, we clarify what remains rare in the literature: large-scale, in vivo genotype–phenotype landscapes for bacterial transcription factor binding sites that are sufficiently dense to support explicit evolutionary analyses. While numerous high-throughput studies have characterized bacterial regulatory elements, these datasets typically do not provide quantitative regulatory phenotypes across large genotype spaces, nor do they analyze evolutionary accessibility. To our knowledge, only one such in vivo TFBS landscape had previously been characterized at comparable resolution for a bacterial local regulator (TetR). Our work extends this approach to three global regulators, enabling systematic comparisons across prokaryotic systems (Abstract, Introduction, lines 64–85). For example, we write in the introduction:

      “For transcription factor binding sites, most pertinent large-scale studies are based on in vitro binding assays, such as protein-binding microarrays (PBMs), and they focus predominantly on eukaryotic transcription factors[6]. While these studies have been instrumental in characterizing transcription factor binding preferences, they typically do not measure regulatory output in a native cellular context. In contrast, comprehensive in vivo data for bacterial TFBSs remain extremely rare. To our knowledge, only two high-resolutionin vivo landscapes have been previously mapped for bacterial regulators, those of the local regulators TetR[18] and LacI[27]. As a result, it remains unclear whether principles inferred from protein landscapes, eukaryotic TFBSs, or in vitro binding assays generalize to transcriptional regulation in bacteria, particularly for global regulators[11] that integrate multiple physiological signals.”

      Why PWMs are insufficient for our question.

      We agree with the reviewer that our original explanation of the role of PWMs was too cursory and should have been addressed explicitly in the Introduction. We have now revised the Introduction to clearly explain why PWMs derived from RegulonDB cannot substitute for empirical GP landscapes in our study (Introduction, lines 102–113).

      In this passage we now explain that, first, PWMs are inferred from a limited number of naturally occurring binding sites—typically on the order of hundreds of sequences—whose diversity reflects evolutionary history and genomic context rather than systematic exploration of sequence space. As a result, PWMs sample only a small and biased subset of the possible TFBS variants, whereas our libraries probe tens of thousands of sequences in a controlled manner, providing substantially broader and more uniform coverage of genotype space (Introduction, lines 102–113).

      Second, PWM scores are not direct measurements of regulatory strength. Instead, they represent probabilistic or heuristic scores that are primarily used for identifying candidate binding sites in genomes. Numerous studies have shown that PWM scores often correlate weakly with in vivo binding affinity or regulatory output, where DNA shape, cooperative interactions, and chromosomal context play important roles. As such, PWMs do not provide quantitative genotype–phenotype relationships for regulation strength (Introduction, lines 102–113).

      Third, PWMs assume independent and additive contributions of individual nucleotide positions. This assumption excludes epistatic interactions by construction. Because epistasis is central to landscape ruggedness, peak structure, and evolutionary accessibility, PWM-based models are fundamentally unsuited to address the evolutionary questions we study here (Introduction, lines 102–113). We now explicitly state this limitation early in the manuscript, rather than only alluding to it later in the Results.

      Sign epistasis and contrast with prior TFBS landscapes.

      We also agree with the reviewer that the extensive sign epistasis we observe—approaching levels expected for uncorrelated random landscapes—is surprising in light of much of the existing empirical landscape literature. Importantly, as the reviewer notes, most previous TFBS landscape studies have focused on in vitro binding systems or on eukaryotic transcription factors, which tend to exhibit smoother and more additive landscapes.

      To address this concern, we have revised the Abstract and Introduction to explicitly frame this contrast as a central result of the study (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “We showed that the regulatory landscapes of all three TFs are highly rugged and have multiple peaks. The ruggedness of all three landscapes is also supported by the prevalence of epistasis between pairs of TFBS mutations (Supplementary Table S5). A particularly important form of epistasis is sign epistasis[24,93,94], because it can lead to multiple adaptive peaks [24,93,94] (see Supplementary Methods 7.5). Our landscapes contain up to 65% of mutation pairs with sign epistasis, a value that is especially high compared to the almost exclusively additive interactions of mutations in eukaryotic TFs[6,125].”

      We now emphasize that prokaryotic TFBS landscapes, particularly for global regulators, appear to be substantially more rugged and epistatic than most previously characterized TFBS landscapes, and that this difference likely reflects fundamental biological distinctions between regulatory systems.

      Revised emphasis and conclusions.

      Following the reviewer’s suggestion, we have adjusted the emphasis of the manuscript accordingly. Rather than highlighting epistasis and contingency as generic evolutionary phenomena, we now present the extreme ruggedness of prokaryotic TFBS landscapes as a system-specific finding with important implications for the evolution of gene regulation. We explicitly note that this raises new questions for future work—such as why prokaryotic regulatory landscapes differ so markedly from eukaryotic ones, and how these differences shape evolutionary dynamics—which we now highlight in the Introduction and Discussion (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “… A possible reason for this greater incidence of epistasis lies in the nature of prokaryotic TFBSs. Specifically, prokaryotic TFBSs are at approximately 20bps twice as long as eukaryotic TFBSs[80,128] and exhibit symmetries that reflect the dimeric state of their cognate TFs[129–131]. These factors may increase the likelihood of intramolecular epistasis. Our observations raise important questions for future work, such as why the landscapes of prokaryotic TFBSs differ so dramatically from those of eukaryotic ones. And what do these differences imply for the evolutionary dynamics of gene regulation?”

      We believe that these revisions substantially improve the clarity, motivation, and positioning of the manuscript, and directly address the reviewer’s concerns by making both the necessity and the novelty of the study clear from the outset.

      (2) I am a bit concerned about the lack of uncertainties incorporated into the results. The authors acknowledge several key limitations of their approach, including the discreteness of the sort-seq bins in determining possible values of regulation strength, the existence of a large number of unsampled sequences in their genotype space, as well as measurement noise in the fluorescence readouts and sequencing. While the authors acknowledge the existence of these factors, I do not see much attempt to actually incorporate the effect of these uncertainties into their conclusions, which I suspect may be important. For example, given the bin size for the fluorescence in sort-seq, how confident are they that every sequence that appears to be a peak is actually a peak? Is it possible that many of the peak sequences have regulation strengths above all their neighbors but within the uncertainty of the fluorescence, making it possible that it's not really a peak? Perhaps such issues would average out and not change the statistical nature of their results, which are not about claiming that specific sequences are peaks, just how many peaks there are. Nevertheless, I think the lack of this robustness analysis makes the results less convincing than they otherwise would be.

      We thank the reviewer for raising this important concern. We fully agree that uncertainties arising from experimental resolution, measurement noise in fluorescence and sequencing, and incomplete sampling of genotype space should be incorporated explicitly into the analysis. While these limitations were acknowledged qualitatively in the original manuscript, we recognize that a direct, quantitative assessment of their impact on our conclusions is essential to strengthen the robustness of the study.

      We first clarify that regulation strength is not discretized in our analysis. For each TFBS, regulation strength is calculated as a continuous weighted average of fluorescence across all sorting bins, based on the sequencing read-count distribution of each sequence across bins. We clarified this information in the main text (Results, lines 201-203). Nevertheless, finite binning resolution and experimental noise introduce uncertainty in these estimates, which could in principle affect the identification of local peaks.

      Importantly, our study does not aim to assert that specific TFBS sequences are definitively peaks. Rather, our focus is on landscape-level statistical and topological properties—such as ruggedness, the abundance and distribution of peaks, and the evolutionary accessibility of strong regulation. We therefore centered our new analyses on testing whether these conclusions are robust to experimentally plausible sources of uncertainty, rather than on the identity of individual peaks.

      To address the reviewer’s concern, we performed two complementary analyses. The first evaluates whether the observed ruggedness of the landscapes could arise as an artifact of incomplete sampling. It addressed the effects of missing genotypes and the possibility of spurious peak identification due to unsampled neighbors. Sparse sampling can introduce opposing biases: true peaks may be missed, while other genotypes may be falsely classified as peaks because fitter neighbors are absent. As shown for uncorrelated random (House-of-Cards) landscapes (Kauffman & Levin, 1987), these effects can partially cancel.

      In this analysis, we constructed a null model by randomly permuting regulation strengths across the mapped genotype network while preserving its topology. The number of peaks in these randomized landscapes is only modestly higher than in the empirical data, indicating that the measured landscapes are close to the maximal ruggedness compatible with the sampled network (Results, lines 308–320).

      In addition, we quantified potential sampling bias by analyzing genotype connectivity. Here we defined the relative connectivity of a genotype as the fraction of possible single-mutant neighbors for which we had measured regulation strength. We observed only a very weak correlation between connectivity and regulation strength (R=-0.1, -0.1, 0.01 for the CRP, Fis, and IHF landscapes, Figures S13-S15). Similarly, the relative connectivity of peak genotypes is only weakly correlated with their regulation strength (R=-0.05, -0.04, 0.06 for the CRP, Fis, and IHF landscapes). (Results, lines 321–330), indicating that strongly regulating genotypes are not preferentially oversampled or undersampled (Results, lines 321–330).

      The second, and most important, analysis directly addresses the reviewer’s concern that experimental uncertainty could affect peak classification and, consequently, landscape navigability. We explicitly incorporated experimentally measured, genotype-specific noise estimates from biological replicates when comparing fitness values between neighboring genotypes. Using these uncertainty-aware comparisons, we then recomputed adaptive-walk dynamics and genotype visitation frequencies on the resulting noisy landscapes.

      We observe strong correlations between visitation frequencies in the noise-free and noisy landscapes across all three transcription factors (new Supplementary Figure S35), indicating that evolutionary accessibility patterns are robust to realistic levels of experimental uncertainty. These analyses are described in the revised Results (lines 622–636) and in a new Supplementary Methods section (“Incorporation of experimental uncertainty into adaptive walks”).

      Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:

      The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:

      Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:

      The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:

      While the quality filtering of the data ensures robustness, ~40% of the TFBS space remains uncharacterized. The authors acknowledge this limitation but could improve the analysis by employing subsampling or predictive modeling.

      We thank the reviewer for raising this point. We agree that undersampling of genotype space is an important limitation of our dataset and that, in principle, subsampling or predictive modeling approaches could be used to address missing genotypes. We have now clarified in the manuscript why these approaches are not straightforward in the context of our analyses and why we did not pursue them here.

      Although approximately 40% of TFBS genotypes were removed during the filtering step due to lack of reliable measurements, this filtering step was necessary to ensure robust estimation of regulation strength from sort-seq data. Importantly, random subsampling of the genotypes in our data set would not alleviate this limitation, because many of our key analyses—such as peak identification, quantification of epistasis, and assessment of evolutionary accessibility—require combinatorially complete local neighborhoods in genotype space. Subsampling would remove mutational neighbors from many neighborhoods, and thus further limit our ability to characterize landscape topology.

      Predictive modeling approaches could, in principle, be used to infer missing genotypes and reconstruct more complete landscapes. However, developing, experimentally validating, and benchmarking such models would not only substantially expand the scope of an already long paper, it would  also require additional assumptions about genotype–phenotype relationships that entail their own limitations. Our primary goal in this work was to provide the first large-scale empirical in vivo regulatory landscapes for global bacterial transcription factors, comprising tens of thousands of experimentally measured variants. We view these empirical landscapes as a necessary foundation upon which predictive modeling and landscape completion can be built in future, complementary studies.

      We have now revised the Discussion (lines 760-770) to explicitly articulate these points and to clarify that, while undersampling remains a limitation, it does not invalidate the landscape-level conclusions we draw from the combinatorially complete neighborhoods present in our data. There we also outline predictive modeling as an important directions for future work.

      For a more detailed answer regarding subsampling and peak classification, please also see our response to comment (2) of Reviewer #1.

      Simplified Regulatory Architecture:

      The study considers a minimal system of a single TFBS upstream of a reporter gene. While this may have been necessary for clarity, this simplification may not reflect the combinatorial complexity of transcriptional regulation in vivo.

      Point well taken. We have added paragraph to state explicitly that the system we use to study gene regulation is much simpler than most in vivo regulatory circuits (Discussion, lines 797-802)

      Lack of Experimental Validation of Simulations:

      The adaptive walks are based on simulated dynamics rather than experimental evolution. Incorporating in vivo experimental evolution studies would strengthen the conclusions. Although this is a large request for the paper, that would not prevent publication.

      We thank the reviewer for this important point. We fully agree that in vivo experimental evolution would provide a valuable and complementary way to validate the evolutionary dynamics inferred from our simulations. However, we ask for the reviewer's understanding that adding experimental evolution to an (already long) paper would go far beyond the scope of our study.

      Also, the goal of our study was not to reproduce evolutionary trajectories experimentally, but to characterize the structure of large empirical regulatory landscapes, and to use these landscapes as a data-driven basis for exploring evolutionary accessibility under well-defined population-genetic assumptions. The adaptive walks we employ are parameterized directly from experimentally measured genotype–phenotype maps, and incorporate established fixation probabilities. Such walks have been widely used to study evolutionary dynamics on empirical landscapes when experimental evolution is not tractable, because it would involve tens of thousands of genotypes that represent small mutational targets and would thus take a long time to evolve.

      An additional issue related to the feasibility of experimental evolution is that performing in vivo experimental evolution for the regulatory landscapes analyzed here would require tracking large populations across a combinatorially vast TFBS space, while simultaneously measuring regulatory phenotypes for thousands of evolving lineages, which is currently not experimentally feasible. This is another reason why simulation-based approaches have been the standard method for linking large-scale empirical landscapes to evolutionary dynamics in both theoretical and experimental studies.

      Furthermore, our conclusions are intentionally framed at the level of statistical and landscape-wide properties (e.g., accessibility of high peaks, contingency, and evolutionary bias), rather than at the level of specific mutational trajectories. As such, they do not rely on the precise reproduction of any single evolutionary path, but on aggregate patterns that are robust to reasonable variation in population-genetic parameters.

      In sum, we do not view experimental evolution as essential for the conclusions we draw, but as an important and exciting direction for future work that may be enabled by the landscapes we have experimentally mapped.

      Impact on the Field:

      This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Dat

      The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:

      The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more biologically complex and influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:

      The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

      We thank the reviewer for their thorough evaluation and for their supportive opinion of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 (Abstract): "Landscape ruggedness does not prevent the evolution of strong regulation, because more than 10% of evolving populations can attain one of the highest peaks." I did not find this interpretation very convincing; only 10% of populations being able to achieve strong regulation sounds to me like ruggedness DOES impede adaptation in the vast majority of cases.

      We thank the reviewer for this thoughtful comment and agree that our original phrasing in the Abstract overstated this conclusion. We did not intend to imply that landscape ruggedness has only a minor effect on adaptation. On the contrary, our results clearly show that ruggedness strongly constrains evolutionary outcomes and prevents the majority of evolving populations from reaching the globally highest regulatory peaks. We have therefore toned down the wording in both the Abstract and the Discussion (lines 670-679) to reflect this more accurately. For example, in the abstract we now state

      “Nonetheless, evolutionary simulations show that ~10% of evolving populations can reach a peak of strong regulation, a proportion that is significantly greater than in comparable random landscapes.”

      In the discussion we state:

      “… Specifically, our evolutionary simulations show that 10% of populations with a size typical of E. coli reach one of the highest peaks. This percentage is significantly higher than in randomized landscapes (Supplementary Methods 9; Supplementary Figure S30)"

      Our intended interpretation was more limited: namely, that ruggedness does not fully preclude the evolution of strong regulation. In highly rugged landscapes with extensive sign epistasis—whose topological properties approach those of uncorrelated random landscapes—the a priori expectation is that access to the strongest peaks could be vanishingly rare or effectively impossible under Darwinian evolution. In this context, observing that a non-negligible fraction of populations (on the order of 10%) can reach one of the highest peaks suggests that strong regulation remains evolutionarily attainable, even though it is far from guaranteed.

      Motivated by the reviewer’s suggestion, we also added a null-model analysis that makes this point more explicitly and quantitatively. Specifically, we constructed randomized landscapes by permuting regulation-strength values across genotypes while preserving the experimentally sampled genotype network topology and all parameters of the evolutionary simulations (Supplementary Methods 9, “Randomized landscape null model for peak accessibility”). We then repeated the adaptive-walk simulations on these shuffled landscapes. This null model provides an expectation for peak accessibility in landscapes with identical sampling, neighborhood structure, and evolutionary dynamics, but without genotype–phenotype correlations.

      Using this null model, we find that the fraction of populations that reach high peaks in the empirical landscapes is substantially higher than expected by chance alone (new Supplementary Figure S30; Results, lines 504–516). Specifically, across the three transcription factors, empirical landscapes exhibit on average a ~3-fold higher accessibility of high regulatory peaks than shuffled landscapes. This comparison does not weaken the conclusion that ruggedness strongly impedes adaptation; rather, it shows that the structure of the measured genotype–phenotype landscapes enables greater accessibility of strong regulation than would be expected in equally rugged but unstructured landscapes.

      In response to the reviewer’s concern, we have revised the abstract and main text to avoid the phrase “does not prevent” and to more accurately convey this balance between constraint and accessibility. We now emphasize that ruggedness strongly constrains adaptation, while still allowing access to strong regulatory peaks at rates that exceed null expectations. (Discussion, lines 512-516). For example, in the discussion we state:

      “… In sum, rugged regulatory landscapes strongly constrain evolutionary trajectories, yet do not render the evolution of strong regulation vanishingly rare. Instead, strong regulatory phenotypes remain evolutionarily attainable at levels that exceed null expectations, even though they are reached by only a minority of evolving populations.”

      We believe that the revised wording, together with the added null-model analysis more faithfully represents our results and strengthens the quantitative interpretation of accessibility in these landscapes.

      (2) Line 123: I found the explanation of the plasmid system and the accompanying SI figures (Figures S1 and S2) confusing in terms of how many plasmids there were. In particular, the Figure S1 graphics show the plasmid specifically with CRP but the text in the graphic and in the caption refers to the plasmid pCAW-Sort-Seq-V2 (which, according to Table S1, isn't that just the base plasmid without any TF?). Figure S2 also shows the plasmid with CRP and does specify pCAW-Sort-Seq-V2-CRP-CRP0 in the graphic, but then the caption refers again only to the base plasmid pCAW-Sort-Seq-V2. I recommend the authors clarify these items for readers who might want to reproduce or build upon their system. In particular, I recommend the main text explain more explicitly that they generate three versions of this plasmid (one for each TF), and then on the backgrounds of each of those three plasmids, a whole library with all the binding site variants.

      We thank the reviewer for pointing out this lack of clarity. We agree that the original description of the plasmid system and the accompanying Supplementary Figures S1 and S2 could be confusing with respect to how many plasmids were used and how they differ.

      To clarify the experimental design, we start from a common backbone plasmid, pCAW-Sort-Seq-V2, which contains all shared regulatory and reporter elements but does not encode any transcription factor. From this backbone, we generated three distinct TF-specific plasmids, each carrying one of the transcription factors studied here—CRP, Fis, or IHF—resulting in pCAW-Sort-Seq-V2-CRP, pCAW-Sort-Seq-V2-Fis, and pCAW-Sort-Seq-V2-IHF. On the background of each TF-specific plasmid, we then constructed a complete library of plasmids containing all variants of the corresponding TF binding site cloned upstream of the reporter gene.

      We have revised the main text to explicitly describe this plasmid hierarchy and library construction strategy and to clarify that three TF-specific plasmids were generated prior to TFBS library construction (Results, Landscape mapping section; lines 159–193). In addition, we have redesigned Supplementary Figures S1 and S2 to facilitate understanding of the plasmid system. Specifically, these figures now clearly distinguish between the base plasmid backbone and the TF-specific plasmid derivatives. Also, the plasmid names shown in the graphics and captions are now consistent with those listed in Supplementary Table S1. Upon final publication, we will also deposit the sequences of all plasmids in Addgene to further facilitate reproducibility.

      (3) Line 135: Can the authors clarify whether these TFs are essential in these media conditions and, if not, why? I was expecting them to be so given the core functions of these TFs as described in the Introduction, but then Figure S3 appears to show that all knockouts are viable.

      We thank the reviewer for raising this important point and apologize for the lack of clarity in the original version of the manuscript. The transcription factors CRP, Fis, and IHF are not essential for viability under the growth conditions used in this study, but they are important for optimal growth and cellular fitness, consistent with their roles as global regulators.

      Under our experimental conditions, single-gene knockout strains (Δcrp, Δfis, and Δihf) are viable but exhibit slower growth dynamics compared to the wild-type strain, reflecting impaired regulation of core cellular processes (Supplementary Figure S3). This behavior is consistent with previous work showing that many global transcriptional regulators in E. coli are conditionally essential or strongly fitness-affecting, rather than absolutely essential under standard laboratory conditions.

      Importantly, while single knockouts remain viable, double mutants involving these global regulators are not viable, indicating substantial functional redundancy and network-level essentiality among global transcription factors. This explains why each TF can be studied individually in isolation, while combinations of deletions cannot be maintained.

      We have now clarified this point in the Results section by explicitly stating that the knockout strains show reduced growth rates but reach comparable cell densities during late exponential or early stationary phase, the growth phase at which all measurements were performed (Results, Landscape mapping section; lines 185–193). This clarification reconciles the apparent discrepancy between the biological importance of these transcription factors discussed in the Introduction and the viability of the single-knockout strains shown in Supplementary Figure S3.

      (4) Lines 141 and 227: The authors appear to refer to two different citations for different versions of RegulonDB (refs. 47 and 66). Did they actually use both versions for different purposes (if so, why?), or is this a typo?

      We thank the reviewer for noticing this inconsistency. We did not use two different versions of RegulonDB. The two separate references were an error. We have now corrected this by using a single, consistent RegulonDB citation in both locations.

      (5) Line 166 (Figure 1 caption): I think 2^8 here should be 4^8.

      Thank you. We have corrected “2<sup>8</sup>” to “4<sup>8</sup>” in the Figure 1 caption.

      (6) Figure 2Are the distributions in Figure 2a (regulation strengths across all TFBSs in the libraries) equivalent to the distributions in Figures S4-S6 (direct fluorescence readout from cell sorting), just transformed from fluorescence to regulation strength? If so I think that would be helpful to clarify, perhaps in the captions to Figures S4-S6 so that it's clear these contain the same information.

      No. Figures S4–S6 and Figure 2a do not show the same distributions. Figures S4–S6 display the raw fluorescence distributions obtained from cell sorting, whereas Figure 2a shows regulation strengths (S), which are derived quantities computed from these fluorescence data. Specifically, regulation strength is calculated as a weighted average over fluorescence bins using the sequencing read distribution for each TFBS (see Methods, “Regulation strengths”).

      To clarify this relationship, we have revised the main text (lines 201-203 and Figure 1b-c), to explicitly state how regulation strengths (S) were calculated.

      (7) Figure 2b: Can the authors label each logo/frequency matrix with its corresponding TF name in the graphic itself? I think this is only implied in the caption.

      We have updated Figure 2b to label each sequence logo / frequency matrix directly in the graphic with its corresponding transcription factor name (CRP, Fis, or IHF), in addition to mentioning these names in the caption. This change clarifies the figure and makes the TF identity immediately apparent to the reader.

      (8) Lines 290 and 298 (Figure 2 caption): The labels for panels b and c appear to be swapped in the caption.

      We thank the reviewer for pointing this out. The labels for panels b and c in the Figure 2 caption were indeed swapped. This has now been corrected.

      (9) Line 379: There is a missing period at the end of this line.

      We have added the missing period at the end of this line.

      (10) Line 400 (Figure 3 caption): There is a missing subtitle for panel c in the caption for this figure (all other panels seem to have bolded subtitles in their captions).

      We have added the missing subtitle for panel c in the Figure 3 caption to match the formatting of the other panels.

      (11) Line 583: There is a missing period after "Methods 7.5)".

      We have added the missing period after “Methods 7.5)”.

      (12) Line 641: "All three landscapes highly rugged" should probably be "All three landscapes are highly rugged".

      We have corrected the sentence to read “All three landscapes are highly rugged.”

    1. eLife Assessment

      Findings from this study are considered fundamental because they identify amino acid uptake, cholesterol synthesis, and protein prenylation as key metabolic regulators of B cell activation, proliferation, and survival, advancing understanding of T-independent immune responses. The study links metabolic reprogramming directly to B cell function, highlighting how cellular metabolism supports immune fitness. The evidence is compelling, combining unbiased proteomic profiling with genetic and pharmacological validation to demonstrate causal roles for these pathways.

    2. Reviewer #1 (Public review):

      The work presented by Cheung et al. used a quantitative proteomics method to capture molecular changes in B cells exposed to LPS and IL-4, a combination of stimuli activating naive B cells. Amino acid transporters, cholesterol biosynthetic enzymes, ribosomal components, and other proteins involved in cell proliferation were found to increase in stimulated B cells. Experiments involving genetic loss-of-function (SLC7A5), pharmacological inhibition (HMGCR, SQLE, prenylation), and functional rescue by metabolites (mevalonate, GGPP) validated the proteomics data and revealed that amino acid uptake, cholesterol/mevalonate biosynthesis, and cholesterol uptake played a crucial role in B cell proliferation, survival, biogenesis, and immunoglobulin class switching. Experiments involving cholesterol-free medium showed that both biosynthesis and LDLR-mediated uptake catered to the cholesterol demand of LPS/IL-4-stimulated B cells. A role for protein prenylation in LDLR-mediated cholesterol uptake was postulated and backed by divergent effects of GGPP rescue in the presence and absence of cholesterol in culture medium.

      Strengths:

      The discovery was made by proteome-wide profiling and unbiased computational analysis. The discovered proteins were functionally validated using appropriate tools and approaches. The metabolic processes identified and prioritized from this comprehensive survey and systematic validation highly likely represent mechanisms of high importance and influence. Analysis of immune cell metabolism at the protein level is relatively compared to transcriptomic and metabolomic analysis.

      The conclusions from functional validation experiments were supported by clear data and based on rational interpretations. This was enabled by well-established readouts/analytical methods used to determine cell proliferation, viability, size, cholesterol content, and transporter/enzyme function. The data generated from these experiments strongly support the conclusions.

      This work reveals a complex, yet intriguing, relationship between cholesterol metabolism and protein prenylation as they serve to promote B cell activation. The effects of pharmacological inhibition and metabolite replenishment on the cholesterol content and activation of B cells were determined and logically interpreted.

      Weaknesses:

      The findings of this study were obtained almost exclusively from ex vivo B cell stimulation experiments. Their contribution to B cell state and B cell-mediated immune responses in vivo was not explored. Without in vivo data, the study still provides valuable mechanistic information and insights, but it remains unknown, and there is no discussion about, how the identified mechanisms may play out in B cell immunity.

      The role of HMGCR, SQLE, and prenylation in B cell activation was assessed using pharmacological inhibitors. Evidence from other loss-of-function approaches, which could strengthen the conclusions, does not exist. This is a moderate weakness and somewhat offset by other data, including those obtained from the tests involving multiple distinct pharmacological inhibitors and the metabolite replenishment experiments.

    3. Reviewer #2 (Public review):

      This study uses mass spectrometry to quantify how LPS + IL-4 modify the mouse B cell proteome as naïve cells undergo blastogenesis and enter the cell cycle. This analysis revealed changes in key proteins involved in amino acid transport and cholesterol biosynthesis. Genetic and pharmacological experiments indicated important roles for these metabolic processes in B cell proliferation.

      This work provides new information about the regulation of TI B cell responses by changes in cell metabolism and also a comprehensive mass spectrometry dataset which will be an important general resource for future studies. The experiments are thorough and carefully carried out. The majority of conclusions are backed up by data that is shown to be highly significant statistically. The comprehensive mass spectrometry dataset will be an important general resource for future studies.

      After revision, the study now includes new data showing that the up regulation of amino acid uptake and cholesterol metabolism is not restricted to LPS + IL-4 (TLR4 + IL4R) stimulation but is also observed after stimulation of TLR7, TLR9, CD40 and the BCR. This increases the impact of this work and shows that this metabolic rewiring is a common feature of B cell activation. The inclusion of inhibitor data showing important roles for MTOR and ERK/p38a MAP kinases in the metabolic changes identified and provides preliminary insights into the mechanisms involved.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We agree with the reviewer that a limitation of our study is its focus on cell-based assays rather than in vivo experiments. We did consider evaluating the effects of statins on B cell responses in vivo; however, this approach is complicated by findings that statins can influence antigen presentation by dendritic cells, thereby impacting antibody responses (Xia et al, 2018). We have revised the discussion section to acknowledge this points.

      The reviewer also noted that our study assessed the roles of HMGCR, SQLE, and prenylation in B cell activation using pharmacological inhibitors and genetic knockdown/out approaches. Loss-of-function techniques such as RNAi, siRNA, and CRISPR can be challenging to apply to primary B cells, but we are exploring their feasibility for future revisions. While we acknowledge the limitations of using pharmacological inhibitors, we have taken several steps to mitigate these, including targeting multiple steps in the cholesterol biosynthetic pathway using structurally distinct inhibitors and conducting rescue experiments by supplementing downstream metabolites. To strengthen the results on prenylation further, we have added data using two further distinct prenylation inhibitors (revised Figure 6). To further investigate potential off-target effects of statins, we performed proteomic analysis of B cells treated with and without fluvastatin. The data suggest that fluvastatin primarily affects cholesterol metabolism and does not cause widespread off-target effects (new Supplementary Figure 9).

      Reviewer #1 (Recommendations for the authors):

      What signalling mechanisms link LPS sensing to proteomic and metabolic changes? Do these changes depend on specific signalling modules downstream of TLR4 (e.g., MyD88, TRIF, NF-kappaB, MAPKs)? Other receptors found to produce similar effects (TLR7, TLR9, CD40) may share these modules. This information could strengthen the conclusion by showing the chain of molecular events through which immune stimuli reprogram B cell metabolism.

      Signalling through most TLRs, including TLR4, TLR7 and TLR9, requires the adaptor protein MyD88. To determine if MyD88 is required for LPS-induced signalling, we carried out immunoblotting to compare signalling in B cells between WT mice and MyD88-deficient mice. We found that phosphorylation of key downstream proteins, including p38 and ERK1/2 (MAPK signalling), Akt, p70S6K and S6 (mTOR signalling) was diminished in MyD88-deficient mice (Figure S11). These results have been added to the manuscript as Supplementary Figure 11.

      We assessed the requirement of these signalling pathways for LPS-induced proliferation by treating B cells with rapamycin to block mTORC1, PD184352 for MEK1/MEK2 (the upstream activators of ERK1/2), VX745 for p38 or a combination of PD184352 and VX745. These results have been added to the manuscript as the new Figure 9. Rapamycin demonstrated the strongest inhibitory effect on proliferation, and combinatorial blocking of MAPK signalling mildly reduced proliferation (Figure 9A-B). In terms of cholesterol metabolism, treatment with all of these inhibitors reduced cholesterol levels; however, treatment with PD184352 and VX745 reduced cholesterol to the same level as naïve B cells (Figure 9F).

      Other activating stimuli appear to have similar effects, we showed originally that TLR7 and TLR9 activation had a similar effect on proliferation and cholesterol to TLR4, as did activation of CD40 and the BCR (Figure 10). We have now expanded this and shown that these other receptors can also promote protein synthesis (new Supplementary Figure 4).

      There seem to be errors in the manuscript text.

      (1) Page 6, line 232: ssRNAseq?

      We that the reviewer for spotting these issues. This has been amended to scRNAseq.

      (2) Page 13, line 490: SC7A5?

      This has been amended to SLC7A5

      (3) The abbreviation CF (cholesterol-free?) is not defined when it first appears.

      This has been amended to cholesterol-free (CF) on page 9, line 411.

      Reviewer #2 (Public review):

      The reviewer suggested that the study would be strengthened by determining whether the observed changes are specific to LPS + IL-4 stimulation or represent a more general B cell response to mitogenic signals. We believe that these effects are not specific to LPS and also occur with other mitogenic stimuli. We have expanded on the data in the original draft showing that other TLR agonists as well as CD40 and BCR stimulation increase both B cell proliferation and cholesterol levels and also looked at the effects of these stimuli on protein synthesis.

      Reviewer #2 (Recommendations for the authors):

      (1) One of the most highly enriched processes is 'response to interferon alpha'. This stands out as most of the other processes identified involve more general cellular processes (i.e., cell proliferation, cell metabolism, etc...). Minimally, interferon alpha should be discussed. It would also be interesting to test whether type I interferons regulate any of the metabolic changes identified.

      Response to interferon alpha has the highest fold enrichment of 6.78. To look at this further compiled a list of proteins upregulated by IFN-α stimulation in murine B cells, derived from (Mostafavi et al, 2016) and compared these with our proteome. We found that most of the IFNα regulated genes were not significantly upregulated following LPS + IL-4 stimulation compared to naïve B cells (Figure S3A). We also measured phosphorylation of the transcription factor STAT1, which is induced by IFNα and IFNβ signalling, and found that LPS stimulation did not induce p-STAT1 (Figure S3B-C). These results have been added to the manuscript as Supplementary Figure 3. Despite this, as discussed further in the manuscript we cannot rule out a weak interferon response in the proteomics.

      (2) The proteome of BCR-stimulated B cells has been analyzed by mass spectrometry. This dataset should be compared with the LPS + IL-4 dataset of the current study. This may reveal whether these two stimulations have similar or different effects on B-cell function. In particular, it is interesting to know whether BCR stimulation induces SLC7A5 expression and whether proteins involved in cholesterol metabolism are altered by BCR stimulation.

      A similar study using anti-IgM and anti-CD40 to activate murine B cells has found an upregulation of amino acid transporters, including SLC7A5, in their proteomic data, suggesting that this is not a stimulus-specific effect. This has been added to the text subsection “Protein synthesis in LPS + IL-4 stimulated B cells is dependent on the uptake of amino acids.” In line with this we have also shown that stimulation of the BCR upregulates protein synthesis (new Supplementary Figure 4). We have added data on HMGCR, SQLE and LDLR form the BCR proteomics experiments to the new Supplementary Figure 13. As the BCR proteome published as a preprint (James et al 2024) is about to be resubmitted as a distinct paper that does not deal with cholesterol metabolism, we have not expanded on this dataset further.

      (3) A role for mTORC1 has been shown for proteome remodelling following BCR stimulation of naïve B cells, regulating the expression of amino acid transporters. Is mTORC1 involved in any of the changes detected following LPS + IL-4 stimulation? (i.e., cell proliferation, ribosome biogenesis, amino acid transport, cholesterol biogenesis).

      To determine the importance of mTORC1 for B cell function, we treated B cells with rapamycin. We found that rapamycin treatment slightly reduced protein synthesis (Figure S12A) and amino acid uptake (Figure S12B). These results have been added to the manuscript as Supplementary Figure 12. Rapamycin reduced cholesterol to almost the levels in naïve B cells (new Figure 9F) and had a significantly inhibitory effect on proliferation (new Figure 9A-B).

      (4) Analysis of Slc7a5 knockout B cells showed that SLC7A5 is required for LPS-induced proliferation (Figure 4G). Is SLC7A5 required for B cell growth following LPS + IL-4 stimulation? Is SLC7A5 required for BCR-induced B cell proliferation/growth?

      There appears to be a misunderstanding, as Figure 4G compares proliferation between WT and SLC7A5 KO B cells following LPS + IL-4 stimulation and not LPS stimulation alone.

      Unfortunately, we no longer have access to Slc7a5fl/fl/Vav-iCre+/- mice and will not be able to measure CTV staining for proliferation following BCR stimulation. However, a similar study using anti-IgM and anti-CD40 to activate murine B cells found that B cells from Slc7a5fl/fl/Vav-iCre+/- mice were significantly smaller, had reduced expression of the chaperone protein CD98 and impaired expression of the transferrin receptor CD71, which is required for iron uptake, compared to WT B cells (James et al, 2024).

      (5) The expression of several key proteins (regulating proliferation/amino acid transport/cholesterol metabolism) is shown to be significantly upregulated by LPS + IL-4 stimulation of naïve B cells. It would be interesting to determine whether these increases result from induced transcription of the relevant genes. This could initially be assessed by qRT-PCR analysis of LPS + IL-4 stimulated primary B cells, or alternatively, mining of online RNAseq datasets.

      We mined RNA-Seq data from C57BL/6 mice (Tesi et al, 2019) which compared naïve B cells and B cells after 2,4, or 8 hours of LPS stimulation. We found that the transcription of genes that coded for the amino acid transporter SLC7A5/SLC3A2 (Figure S6A-B) and key genes involved in cholesterol metabolism followed the same pattern of upregulation as our proteomic data (Figure S6C-F). These results have been added to the manuscript as a new Supplementary Figure 6.

      (6) Cholesterol levels are shown to be increased following resiquimod, CpG, anti-IgM, and CD40L stimulation (Figure 9). What effect do these agonists have on levels of HMGCR, SQLE, and LDLR in B cells? Is B-cell growth by these agonists impaired by Fluvastatin.

      We found that stimulation of murine B cells with either IL-4, anti-IgM or anti-CD40 could increase levels of HMGCR, SQLE and LDLR, with the largest increase seen with a combination of these stimuli (Figure S13A-D) (James et al, 2024). These results have been added to the manuscript as Supplementary Figure 13.

      Figures 10C-E show that B cell growth, survival and proliferation are impaired by Fluvastatin after Resiquimod, CpG, anti-IgM, and CD40L stimulation, although we do not have proteomic data from these stimuli to confirm the levels of HMGCR, SQLE and LDLR.

      We carried out proteomics after 24 hours of LPS + IL-4 stimulation in normal/CF media, with or without Fluvastatin. We found that Fluvastatin treatment in normal media increased the expression of HMGCR, SQLE and LDLR. Fluvastatin treatment in CF media had the highest increase in the expression of these key proteins (Figure S9G-J). These results have been added to the manuscript as Supplementary Figure 9.

      (7) Do Fluvastatin or FGTI-2734 affect early activation of signaling pathways by LPS + IL-4 stimulation of B cells? (eg. MAPKs, STATs, PI3K/AKT).

      This is an interesting question, we will pursue this in our future work.

      References:

      James O, Sinclair LV, Lefter N, Salerno F, Brenes A & Howden AJM (2024) A proteomic map of B cell activation and its shaping by mTORC1, MYC and iron. bioRxiv 2024.12.19.629506 doi:10.1101/2024.12.19.629506

      Xia Y, Xie Y, Yu Z, Xiao H, Jiang G, Zhou X, Yang Y, Li X, Zhao M, Li L, et al (2018) The Mevalonate Pathway Is a Druggable Target for Vaccine Adjuvant Discovery. Cell 175: 1059-1073.e21

    1. eLife Assessment

      This study represents a useful finding on the social modulation of the complex repertoire of vocalizations made across a variety of strains of lab mice. The evidence supporting the claims is, at present, incomplete, as numerous concerns regarding the appropriate categorization of vocalizations, the averaging of data points with disparate levels of occurrence, the interpretation of the function of noisy calls, and a general lack of adequate analyses of experimental data were raised. With these issues addressed, the work will be of importance to scientists studying rodent vocal communication.

    2. Reviewer #1 (Public review):

      Summary:

      Adult laboratory mice produce ultrasonic vocalizations during free social interactions, as well as lower-frequency, voiced calls (squeaks) during aversive contexts. The question of whether mice possess a more complex repertoire of vocalizations has been of great interest to scientists studying rodent vocal behavior. In the current study, the authors analyze the rates and acoustic features of vocalizations produced by pairs of mice that are allowed to interact across a barrier, which prevents direct physical interaction. In this context, they find that same-sex (but not opposite-sex) pairs of mice produce vocalizations that are lower in frequency than the typical 70 kHz ultrasonic vocalizations produced during free interactions and that are also distinct from squeaks. These lower frequency vocalizations were observed in both male-male and female-female pairs, as well as in same-sex pairs from multiple mouse strains. The authors also report that call rates and acoustic features are not affected in male-male pairs that have been treated with the anxiolytic drug buspirone, suggesting that anxiety is not a major driver of vocalization in this behavioral context.

      Strengths:

      (1) The observation that same-sex pairs of mice produce lower frequency (<70 kHz) vocalizations in this behavioral context is novel.

      (2) The consideration of multiple types of pairs (female-female, male-male, and female-male), as well as the inclusion of multiple strains of mice and barriers with different hole diameters, are all strengths of the study.

      (3) The authors include detailed analyses of vocalization acoustic features, as well as detailed tracking of mouse positions relative to the barrier.

      Weaknesses:

      The categorization applied to vocalizations based on their mean frequencies is poorly supported and ignores the distinction in laryngeal production mechanism between voiced and ultrasonic vocalizations. Specifically, the authors are likely lumping together voiced and ultrasonic vocalizations into their "low frequency" (< 30 kHz) category, while they reserve the term "ultrasonic" exclusively for the subset of ultrasonic vocalizations with the highest mean frequencies (> 50 kHz). This categorization scheme also does not align well with past work on lower frequency rodent vocalizations, which complicates the comparison of the present findings to that past work.

      In some analyses, the authors report that different groups of mice produce different relative proportions of vocalization types (as defined by mean frequency) but then compare acoustic features of vocalizations between groups after pooling all vocalizations together. The analyses of acoustic features conducted in this way may be confounded by the different proportions of vocalization types across groups.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine vocal communication during same-sex dyadic interactions in mice, comparing periods of physical separation (with limited sensory access) to direct social contact. They report that separation dramatically alters the vocal repertoire, shifting it away from canonical ultrasonic vocalizations (USVs) toward low-frequency vocalizations (LFVs) and broadband "noisy" calls. While LFVs and noisy calls have been described previously, largely in aversive contexts, this study provides a detailed, systematic characterization of these vocalizations during social interactions, thereby extending prior work.

      The authors explore several experimental manipulations and analyses, including divider hole size, strain and sex differences, anxiolytic drug treatment, and correlations with spatial proximity, to infer potential functions of these call types. Although the dataset is rich, the results are largely descriptive, and many conclusions remain tentative. Several experimental variables are not fully controlled, and in some cases, the interpretation exceeds what the data can clearly support. Nonetheless, with improved experimental framing, additional analyses of existing data, and a clearer discussion of limitations, this work has the potential to make a valuable contribution by broadening the field's focus beyond USVs to understand a wider vocal repertoire relevant to social context.

      Strengths:

      Much work on mouse vocal communication focuses almost exclusively on USVs. This manuscript convincingly demonstrates that non-USV vocalizations (LFVs and noisy calls) are prominent and systematically modulated by social context, highlighting an underappreciated dimension of mouse communication. Furthermore, the authors employ several experimental manipulations, including sensory access, strain, sex, and pharmacological treatment, to assess changes in vocalization repertoire. This provides a valuable resource for the field and reveals robust context dependence of vocalization. The discussion is thoughtful and integrative, particularly in its consideration of potential communicative roles of LFVs and noisy calls and their relationship to sensory constraints and signal propagation, although these ideas will require further experimental validation.

      Weaknesses:

      There are several concerns regarding experimental design and data interpretation that could be addressed to strengthen the manuscript.

      (1) The terminology used for vocalization types is confusing and needs better clarification. The authors refer to Grimsley et al. (2016) multiple times, yet they use the same names for their vocalizations while applying different definitions. This makes it very difficult to compare the two papers. Since this study and Grimsley et al. use different mouse strains (FVB vs CBA), a direct comparison of absolute frequencies may also not be appropriate. Please explicitly clarify the definitions of the call types (e.g., frequency range, voiced vs. USV) and explain how they relate to those in the previous study earlier in the manuscript.

      (2) In the initial experiment, mice always experience separation first (15 minutes), followed by unification (5 minutes), using novel same-sex dyads. Multiple factors besides physical contact could influence vocalization across this sequence, including habituation to the arena, reduced anxiety over time, or increasing familiarity with the partner despite physical separation. It is unclear whether the authors have tested the reverse order (unification first, followed by separation). If not, this limitation should be explicitly acknowledged. In addition, examining whether vocalizations or behaviors change over the course of the 15-minute separation period, for example, by comparing early vs late phases, could help disentangle effects of habituation from those of physical separation per se.

      (3) The conclusion that separation-induced LFVs are unlikely to be anxiety-driven may overinterpret the buspirone experiment (Figure 8). Vehicle injections themselves produced large changes in call rate and call-type distribution, raising concerns about stress or arousal induced by the injection procedure. Comparisons between buspirone-treated animals and untreated animals are therefore problematic, as these groups differ in their experimental histories, including the number of exposures. The manuscript would benefit from independent measures confirming the anxiolytic efficacy of buspirone compared to vehicle injection in this paradigm, such as behavioral readouts of anxiety. In addition, the experimental design requires a clearer description. It is not always clear whether the same dyads were tested twice, or how social familiarity, contextual familiarity, and habituation to injections were handled. Male data comparing first and second exposures should also be included as supplementary figures to allow direct comparison with the excluded female dataset.

      (4) The idea that noisy calls function to attract conspecific attention is intriguing. However, in Figure 5, all call types, including LFVs and USVs, are most likely to occur when mice are already in close proximity during separation, which seems inconsistent with a long-distance signaling role. Analyses of the temporal relationship between vocalizations and behavior would strengthen this claim. For example, it would be informative to test whether bouts of noisy calls precede approach behavior or a reduction in inter-animal distance. Examining whether calls occur before, during, or after orientation toward the partner could further clarify whether these vocalizations actively modulate social behavior.

      (5) The effects of divider hole size on vocal repertoire are striking but difficult to interpret. Unexpectedly, small holes and no holes yield similar call distributions, whereas large holes produce a markedly different profile dominated by LFVs, which also differs from free interactions. If large holes allow greater tactile or close-range interaction, the reduction in USVs and MFV is counterintuitive. Incorporating behavioral metrics such as distance, orientation, or specific interaction types alongside call classification would greatly aid interpretation and help link vocal output to interaction quality rather than divider type alone.

      (6) Throughout the study, vocalizations are pooled across both animals in the dyad. Because the arena is neutral rather than a home cage, either animal could be initiating vocalization. Assigning calls to individuals, where possible, using spatial or acoustic cues, would substantially strengthen functional interpretations. Even limited analyses, e.g., identifying which animal vocalizes first or whether calls precede approach by the partner, could provide important insight into the communicative role of different call types.

    4. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Adult laboratory mice produce ultrasonic vocalizations during free social interactions, as well as lower-frequency, voiced calls (squeaks) during aversive contexts. The question of whether mice possess a more complex repertoire of vocalizations has been of great interest to scientists studying rodent vocal behavior. In the current study, the authors analyze the rates and acoustic features of vocalizations produced by pairs of mice that are allowed to interact across a barrier, which prevents direct physical interaction. In this context, they find that same-sex (but not opposite-sex) pairs of mice produce vocalizations that are lower in frequency than the typical 70 kHz ultrasonic vocalizations produced during free interactions and that are also distinct from squeaks. These lower frequency vocalizations were observed in both male-male and female-female pairs, as well as in same-sex pairs from multiple mouse strains. The authors also report that call rates and acoustic features are not affected in male-male pairs that have been treated with the anxiolytic drug buspirone, suggesting that anxiety is not a major driver of vocalization in this behavioral context.

      Strengths:

      (1) The observation that same-sex pairs of mice produce lower frequency (<70 kHz) vocalizations in this behavioral context is novel.

      (2) The consideration of multiple types of pairs (female-female, male-male, and female-male), as well as the inclusion of multiple strains of mice and barriers with different hole diameters, are all strengths of the study.

      (3) The authors include detailed analyses of vocalization acoustic features, as well as detailed tracking of mouse positions relative to the barrier.

      Weaknesses:

      The categorization applied to vocalizations based on their mean frequencies is poorly supported and ignores the distinction in laryngeal production mechanism between voiced and ultrasonic vocalizations. Specifically, the authors are likely lumping together voiced and ultrasonic vocalizations into their "low frequency" (< 30 kHz) category, while they reserve the term "ultrasonic" exclusively for the subset of ultrasonic vocalizations with the highest mean frequencies (> 50 kHz). This categorization scheme also does not align well with past work on lower frequency rodent vocalizations, which complicates the comparison of the present findings to that past work.

      We thank the reviewer for their assessment. Firstly, we did not use mean frequencies, but peak frequencies of each single call.

      The distinction between ‘voiced’ and ‘whistled’ vocalizations based on their spectral-temporal features is hardly possible. While evidence in form of audio recordings made from both deer mouse and grasshopper mouse in helium-enriched air suggests vocalizations with lower fundamental frequency being ‘voiced’ (Pasch et al., 2017; Riede et al., 2022), a computational model considering the laryngeal anatomy of Mus musculus estimates fundamental frequencies of vocalizations at subglottal phonation threshold pressures usual for USVs to be in the range of 1 – 5 kHz and approaching 10 kHz for higher subglottal pressures usually found in the production of ‘voiced’ vocalizations (Pasch et al., 2017). Furthermore, a recent study in the singing mouse (Scotinomys teguina) found minimal fundamental frequencies of single song notes, produced by a whistle mechanism, to be about 4 kHz (Zheng et al., 2025). Thus, the presence of low fundamental (peak) frequencies in mouse vocalizations alone appears to be insufficient for deducing the production mechanism of these vocalizations.

      We did not observe differences in acoustic features clearly separating our ‘LFV’ calls into two groups suggestive of different production mechanisms. Thus, we cannot rule out that our ‘LFV’ class contains vocalizations produced by different mechanisms. However, we did not observe any squeaks in our experiments and can therefore rule out that this prominent type of ‘voiced’ call is lumped together with other calls in the ‘LFV’ calls.

      While the questions regarding production mechanism, the neurocircuitry involved, and the context-dependent choice of which mechanism to use is intriguing/enticing, the distinction between ‘voiced’ and ‘whistled’ vocalizations lies beyond the scope of our manuscript. Instead, the neurocircuitry involved in mouse vocalization production, particularly USVs and squeaks has been revealed by other laboratories. Optogenetical activation of RAm Nts neurons elicited emission of both audible vocalizations (fundamental frequencies of 10 kHz and below) and USVs in awake mice in a stimulus-dependent manner (Veerakumar et al., 2023). Furthermore, optogenetical activation of RAm-vocalization neurons led to immediate measurable adduction of vocal folds and emission of canonical USVs (Park et al., 2024). While different populations of PAG neurons are responsible for the production both squeaks and USVs (Ziobro et al., 2024), the two input streams seem to converge on RAm vocalization neurons, as silencing the output of these neurons abolished both squeak and USV emission completely (Park et al., 2024). Thus, while near complete closing of the vocal folds is necessary for the production of canonical USVs (Mahrt et al., 2016; Park et al., 2024), it is not clear which degree of vocal fold opening would result in what fundamental frequencies.

      We will add a paragraph on this issue to the discussion in the next version of the manuscript.

      In some analyses, the authors report that different groups of mice produce different relative proportions of vocalization types (as defined by mean frequency) but then compare acoustic features of vocalizations between groups after pooling all vocalizations together. The analyses of acoustic features conducted in this way may be confounded by the different proportions of vocalization types across groups.

      We displayed the relative distribution of the different call classes demonstrating that 80% of the call repertoire during the separation consisted of noisy calls and ‘LFV’. Thus, the per individual averaged acoustic features e.g. peak frequency would be predominantly shaped by the features of these two call classes. However, we agree with the reviewer’s criticism and will provide a more detailed display and analysis of the acoustic features of each call class.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine vocal communication during same-sex dyadic interactions in mice, comparing periods of physical separation (with limited sensory access) to direct social contact. They report that separation dramatically alters the vocal repertoire, shifting it away from canonical ultrasonic vocalizations (USVs) toward low-frequency vocalizations (LFVs) and broadband "noisy" calls. While LFVs and noisy calls have been described previously, largely in aversive contexts, this study provides a detailed, systematic characterization of these vocalizations during social interactions, thereby extending prior work.

      The authors explore several experimental manipulations and analyses, including divider hole size, strain and sex differences, anxiolytic drug treatment, and correlations with spatial proximity, to infer potential functions of these call types. Although the dataset is rich, the results are largely descriptive, and many conclusions remain tentative. Several experimental variables are not fully controlled, and in some cases, the interpretation exceeds what the data can clearly support. Nonetheless, with improved experimental framing, additional analyses of existing data, and a clearer discussion of limitations, this work has the potential to make a valuable contribution by broadening the field's focus beyond USVs to understand a wider vocal repertoire relevant to social context.

      Strengths:

      Much work on mouse vocal communication focuses almost exclusively on USVs. This manuscript convincingly demonstrates that non-USV vocalizations (LFVs and noisy calls) are prominent and systematically modulated by social context, highlighting an underappreciated dimension of mouse communication. Furthermore, the authors employ several experimental manipulations, including sensory access, strain, sex, and pharmacological treatment, to assess changes in vocalization repertoire. This provides a valuable resource for the field and reveals robust context dependence of vocalization. The discussion is thoughtful and integrative, particularly in its consideration of potential communicative roles of LFVs and noisy calls and their relationship to sensory constraints and signal propagation, although these ideas will require further experimental validation.

      Weaknesses:

      There are several concerns regarding experimental design and data interpretation that could be addressed to strengthen the manuscript.

      (1) The terminology used for vocalization types is confusing and needs better clarification. The authors refer to Grimsley et al. (2016) multiple times, yet they use the same names for their vocalizations while applying different definitions. This makes it very difficult to compare the two papers. Since this study and Grimsley et al. use different mouse strains (FVB vs CBA), a direct comparison of absolute frequencies may also not be appropriate. Please explicitly clarify the definitions of the call types (e.g., frequency range, voiced vs. USV) and explain how they relate to those in the previous study earlier in the manuscript.

      The existence and use of various distinct classification systems for mouse vocalizations is well known and the need to agree on a common classification system is consensus in the field. Thus, it was not our intention to complicate mouse call classification even more.

      Grimsley at al. (2016) reserve the ‘low frequency’ band solely for squeaks (or “low frequency harmonics”). Hence, it appears straight forward to name mouse calls with “mean dominant frequencies” falling between squeaks and USVs, “mid-frequency tonal vocalizations (MFVs)” (Grimsley et al., 2016). We did not observe the emission of squeaks in our experiments, but instead we observed tonal vocalizations in a peak frequency spectrum encompassing both squeaks and Grimsley and colleagues’ ‘MFVs’, representing the lowest peak frequencies we observed (< 32 kHz). Furthermore, we observed vocalizations in the range of 32 – 50 kHz (which were not low frequency components of canonical USVs) and of > 50 kHz (corresponding to canonical USVs). Leaning on the terminology of Grimsley and colleagues (2016), we thought it to be straightforward to name these call classes according to their location on the frequency spectrum: low frequency vocalizations (LFVs; up to 32 kHz), encompassing squeaks, but also Grimsley and colleagues’ MFVs, middle frequency vocalizations (MFVs; 32 – 50 kHz), and finally canonical USVs (> 50 kHz). Admittedly, choosing ‘MFVs’ for mouse calls with different acoustic features than those described by Grimsley and colleagues (2016) has caused unnecessary confusion. We therefore consider adapting our classification scheme for the next version of the manuscript.

      Regarding the comparison of call classes between different mouse strains, strain differences of spectral-temporal features of call classes have been described for canonical USVs (e.g. Scattoni et al., 2008). However, the acoustic features as well as call repertoire are still quite comparable. Furthermore, we have additionally tested both CBA/J and C57BL/6J mice in our study confirming the presence of both noisy calls, ‘LFVs’, ‘MFVs’, and ‘USVs’ in the vocal repertoire of these two strains.

      We will provide a more detailed display and analysis of the acoustic features of the call classes with the next version of the manuscript.

      (2) In the initial experiment, mice always experience separation first (15 minutes), followed by unification (5 minutes), using novel same-sex dyads. Multiple factors besides physical contact could influence vocalization across this sequence, including habituation to the arena, reduced anxiety over time, or increasing familiarity with the partner despite physical separation. It is unclear whether the authors have tested the reverse order (unification first, followed by separation). If not, this limitation should be explicitly acknowledged. In addition, examining whether vocalizations or behaviors change over the course of the 15-minute separation period, for example, by comparing early vs late phases, could help disentangle effects of habituation from those of physical separation per se.

      We had not tested mice in the reverse order, beginning with 5 minutes of unification followed by 15 minutes of separation. Therefore, we acknowledge this limitation of our study and will address it explicitly in the next version of our manuscript. We appreciate the reviewer’s note regarding the inclusion of vocalizations over time and aim to provide this analysis in the next version of the manuscript.

      (3) The conclusion that separation-induced LFVs are unlikely to be anxiety-driven may overinterpret the buspirone experiment (Figure 8). Vehicle injections themselves produced large changes in call rate and call-type distribution, raising concerns about stress or arousal induced by the injection procedure. Comparisons between buspirone-treated animals and untreated animals are therefore problematic, as these groups differ in their experimental histories, including the number of exposures. The manuscript would benefit from independent measures confirming the anxiolytic efficacy of buspirone compared to vehicle injection in this paradigm, such as behavioral readouts of anxiety. In addition, the experimental design requires a clearer description. It is not always clear whether the same dyads were tested twice, or how social familiarity, contextual familiarity, and habituation to injections were handled. Male data comparing first and second exposures should also be included as supplementary figures to allow direct comparison with the excluded female dataset.

      We agree with the reviewer’s point that the injection procedure itself appeared to have an impact on vocalization behavior. In fact, we had included the ‘untreated’ cohort in Fig. 8 despite their different experimental history to appreciate the potential impact of injection onto vocal behavior.

      Furthermore, we appreciate the reviewer’s point of confirming the anxiolytic effect of buspirone treatment with further behavioral readouts and aim to provide such analysis in the next version of the manuscript.

      Regarding the reviewer’s query for clearer experimental design description, the same dyads were tested twice. All mice lived in groups in their home cage, however, they had not met the individual they would face during the experiment before the first experiment. We will improve the description of the experimental design addressing the reviewer’s points in the next version of the manuscript.

      (4) The idea that noisy calls function to attract conspecific attention is intriguing. However, in Figure 5, all call types, including LFVs and USVs, are most likely to occur when mice are already in close proximity during separation, which seems inconsistent with a long-distance signaling role. Analyses of the temporal relationship between vocalizations and behavior would strengthen this claim. For example, it would be informative to test whether bouts of noisy calls precede approach behavior or a reduction in inter-animal distance. Examining whether calls occur before, during, or after orientation toward the partner could further clarify whether these vocalizations actively modulate social behavior.

      We appreciate the reviewer’s remarks regarding the apparent inconsistencies between noisy calls as conspecific attraction calls and their occurrence in close mouse-to-mouse proximity. We must concede that the size of our testing arena limited the maximum distances mice could achieve. Thus, we aim to provide a more extensive analysis including approach behavior and changes of inter-animal distances for resubmission of the manuscript as suggested by the reviewer.

      (5) The effects of divider hole size on vocal repertoire are striking but difficult to interpret. Unexpectedly, small holes and no holes yield similar call distributions, whereas large holes produce a markedly different profile dominated by LFVs, which also differs from free interactions. If large holes allow greater tactile or close-range interaction, the reduction in USVs and MFV is counterintuitive. Incorporating behavioral metrics such as distance, orientation, or specific interaction types alongside call classification would greatly aid interpretation and help link vocal output to interaction quality rather than divider type alone.

      We agree with the reviewer that the interpretation of the divider-hole-size-experiment are difficult and following this reviewer’s input, aim to provide additional behavioral analysis for the effect of divider hole size with the next version of the manuscript.

      (6) Throughout the study, vocalizations are pooled across both animals in the dyad. Because the arena is neutral rather than a home cage, either animal could be initiating vocalization. Assigning calls to individuals, where possible, using spatial or acoustic cues, would substantially strengthen functional interpretations. Even limited analyses, e.g., identifying which animal vocalizes first or whether calls precede approach by the partner, could provide important insight into the communicative role of different call types.

      We agree with the points raised by the reviewer regarding the importance of assigning recorded calls to the respective individual for deciphering the communicative role of different call types. Unfortunately, our system was only equipped with one condenser microphone therefore we are not able to assign calls to individual mice.

      Literature:

      Grimsley, J. M. S., Sheth, S., Vallabh, N., Grimsley, C. A., Bhattal, J., Latsko, M., Jasnow, A., & Wenstrup, J. J. (2016). Contextual Modulation of Vocal Behavior in Mouse: Newly Identified 12 kHz „Mid-Frequency“ Vocalization Emitted during Restraint. Frontiers in Behavioral Neuroscience, 10, 38. https://doi.org/10.3389/fnbeh.2016.00038

      Mahrt, E., Agarwal, A., Perkel, D., Portfors, C., & Elemans, C. P. H. (2016). Mice produce ultrasonic vocalizations by intra-laryngeal planar impinging jets. Current Biology: CB, 26(19), R880–R881. https://doi.org/10.1016/j.cub.2016.08.032

      Park, J., Choi, S., Takatoh, J., Zhao, S., Harrahill, A., Han, B.-X., & Wang, F. (2024). Brainstem control of vocalization and its coordination with respiration. Science (New York, N.Y.), 383(6687), eadi8081. https://doi.org/10.1126/science.adi8081

      Pasch, B., Tokuda, I. T., & Riede, T. (2017). Grasshopper mice employ distinct vocal production mechanisms in different social contexts. Proceedings. Biological Sciences, 284(1859), 20171158. https://doi.org/10.1098/rspb.2017.1158

      Riede, T., Kobrina, A., Bone, L., Darwaiz, T., & Pasch, B. (2022). Mechanisms of sound production in deer mice (Peromyscus spp.). The Journal of Experimental Biology, 225(9), jeb243695. https://doi.org/10.1242/jeb.243695

      Scattoni, M. L., Gandhy, S. U., Ricceri, L., & Crawley, J. N. (2008). Unusual repertoire of vocalizations in the BTBR T+tf/J mouse model of autism. PloS One, 3(8), e3067. https://doi.org/10.1371/journal.pone.0003067

      Veerakumar, A., Head, J. P., & Krasnow, M. A. (2023). A brainstem circuit for phonation and volume control in mice. Nature Neuroscience, 26(12), 2122–2130. https://doi.org/10.1038/s41593-023-01478-2

      Zheng, X. M., Harpole, C. E., Davis, M. B., & Banerjee, A. (2025). Vocal repertoire expansion in singing mice by co-opting a conserved midbrain circuit node. Current Biology: CB, 35(23), 5762-5778.e6. https://doi.org/10.1016/j.cub.2025.10.036

      Ziobro, P., Woo, Y., He, Z., & Tschida, K. (2024). Midbrain neurons important for the production of mouse ultrasonic vocalizations are not required for distress calls. Current Biology: CB, 34(5), 1107-1113.e3. https://doi.org/10.1016/j.cub.2024.01.016

    1. eLife Assessment

      This important paper substantially advances our understanding of how Molidustat may work, beyond its canonical role, by identifying its therapeutic targets in cancer. This study presents a compelling and well-structured investigation into the therapeutic vulnerabilities of APC-mutant colorectal cancer. This work will be of broad interest to the cancer community in studying small molecules and their therapeutic targets.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to uncover novel therapeutic vulnerabilities in APC-mutant colorectal cancer (CRC), which constitutes the majority of CRC cases. They hypothesized that modulating oxygen-sensing pathways (via PHD inhibition) could disrupt adaptive stress responses in these tumours.

      Strengths:

      The study employs a powerful, two-pronged approach to identify Molidustat's targets. By using both Thermal Proteome Profiling (TPP) and an orthogonal chemical proteomic competition assay, the authors provide compelling evidence that GSTP1 is a genuine, direct off-target, effectively addressing the common limitation of indirect effects in proteomic screens.

      Weaknesses:

      (1) In Figure 1, the current data rely on a single guide RNA (sgRNA). To make the data solid, at least two independent sgRNAs targeting different regions of PHD2 should be used.

      (2) Figure 3E: Asn205 site should be mutated to prove that whether Molidustat inhibits GSTP1 activity via Asn205 or not.

      (3) Figure 5B and 5C: The metabolic imbalance phenotype observed upon dual knockout of PHD2 and GSTP1 requires rescue experiments to confirm on-target specificity.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine Molidustat targets and the potential utility of these findings. They clearly demonstrate that Molidustat interferes with GSTP1 and some other proteins on top of PHD2. They also demonstrate that PHD2 deletion is not sufficient to recapitulate Molidustat effects in cells and proteomes. Finally, they demonstrate synthetic lethality in organoids for Molidustat and APC deletion.

      Strengths:

      The data on Molidustat proteomes, GSTP1 binding, inhibition and metabolic health of organoids is really clear. All biochemical, docking and omic data are really strong. The potential impact of these findings could be the use of Molidustat in APC null tumours and awareness of potential off-target effects.

      Weaknesses:

      A main but minor weakness is that Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.

      Reviewer #3 (Public review):

      In this paper, the authors revealed that Molidustat can induce a dose-dependent increase in Caspase-3/7 activity in the HT29 cell line, which is an APC-mutant colorectal cancer cell line. More importantly, they found that targeting PHD2 alone cannot cause cell death. By using thermal proteome profiling (TPP) and orthogonal chemical proteomic competition assays, they determined GTSP1 as a previously undiscovered off-target of Molidustat. They also revealed that combined PHD2 and GSTP1 loss leads to an increase in intracellular ROS and apoptosis. Moreover, they evaluated the effects of Molidustat in colonic organoids and showed that Molidustat has a high selectivity for colonic organoids with activated WNT signaling and/or KRAS pathway alterations, and this effect is not reproduced by hydroxylase inhibition alone, providing a new potential approach to targeting both PHD2 and GTSP1 for the treatment of APC-mutant CRC.

      Specific comments:

      (1) What is the possible molecular mechanism of dual GSTP1/PHD2 loss, inducing cell death?

      (2) Can the authors mutate the binding site of Molidustat on GTSP1 to verify the in silico docking results?

      (3) Evidence for Molidustat inhibiting PHD2 activity or stabilising HIF-1α should be provided.

    1. eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modelling, and model-based fMRI analyses provides solid support for the main claims. The addition of new model-comparison figures in revision effectively addresses the previously noted potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, while the computational model captures the pattern of "system neglect" well, qualitatively distinct mechanisms, such as hyper-prior attraction toward experiment-wise mean parameters, reporting biases, or probability-outlier underweighting, could produce similar behavioural signatures and cannot be fully disambiguated with the current design alone; however, converging evidence from the authors' prior work partially mitigates this concern.

    2. Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

    3. Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task,

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples.

    4. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we addressed Reviewer 3’s remaining concern on the potential confound between posterior probability and time in neuroimaging results. First, as suggested by the reviewer, we provided images of activations for the effect of Pt and delta Pt after controlling for intertemporal prior in GLM-2. Second, we compared the effect of Pt and delta Pt between GLM-1 (without intertemporal prior) and GLM-2 (with intertemporal prior) and showed the results in a new figure (Figure 4).

      Regarding issue on reliance on explicitly instructed probabilities, we wish to point out that most of the concerns such as response mode and regression to the mean were addressed in the original behavioral paper by Massey and Wu (2005). Please see our response to this point in detail in Weakness (2) posted by Reviewer 3.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we focused on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we presented whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we compared the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4) in the revised manuscript.

      As suggested by the reviewer, we also added slice-by-slice images of the whole-brain results on Pt and delta Pt in the supplement in addition to the Tables of Activation so that the activated brain regions can be clearly seen through these images.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      We thank the reviewer for this comment. We thank you for putting out that there are alternative models that can describe the over- and underreaction seen in the dataset. Massey and Wu (2005) dealt with this possibility in their original paper. Their concern was not so much about alternative ways of modeling their results, but in terms of alternative psychological processes. For example, asymmetric noise accounts have been posited in the judgment and decision making literature as possible accounts of phenomena like over-confidence. They addressed what might be crudely called “regression/attraction to the mean” in two ways. First, they looked at median responses as well as mean responses (because medians are less affected by the regressive effect) and found the same patterns of over- and underreactions. Second, they also generated sequences that matched particular posterior probabilities (so that over- and underreaction cannot be explained by regression to the mean) and still found under- and overreactions.

      We also wish to point out in the judgment and decision making literature starting from Edwards (1968), there is a long history of using normative Bayesian model as the starting model and subsequently develop quasi-Bayesian models (like the system-neglect model) to describe systematic deviations from the normative Bayesian.

      Finally, we want to clarify that our primary goal is not to engage in model fitting exercise that examines different possible models. To us, what is more important is that system neglect is a psychologically motivated hypothesis. It is built on the idea that the lack of sensitivity to the system parameters is due to the fact that people focus primarily on the signals and secondarily on the system parameters that generate the signals. Massey and Wu (2005) dealt with a host of other potential explanations through experimental manipulations and data analysis. In this paper, we built on Massey and Wu to examine the neurocomputational basis that gives rise to over- and underreactions.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we added a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also showed the results on intertemporal prior on vmPFC and ventral striatum from GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n = 30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of P<sub>t</sub>. First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results of Pt and delta Pt from GLM-2. We also compared the effect of Pt and delta Pt between GLM-1 and GLM-2. We found that the effect of Pt and delta Pt did not differ between GLM-1 and GLM-2. GLM-1 and GLM-2 differed on whether various task-related regressors contributing to Pt, including the intertemporal prior, were included in the model. In GLM-1, those task-related regressors were not included. In GLM-2, the task-related regressors were included in addition to Pt and delta P.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      Here is the section in the main text where we discussed the new Figure 4 on page 19-22:

      We further examined the robustness of P<sub>t</sub> and ∆P<sub>t</sub> representations in vmPFC and ventral striatum in three follow-up analyses. In the first analysis, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub> and ∆P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors. Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, , where q is transition probability and t = 1, …, 10is the period (Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. We found that the results of P<sub>t</sub> and ∆P<sub>t</sub> in the vmPFC and ventral striatum in GLM-2 were identical to those in GLM-1 (Fig. 4): Fig. 4A was meant to depict the results in slices identical to those shown in Fig. 3B for results based on GLM-1. For slice-by-slice results, see Fig. S7 in SI for results based on GLM-1 and Fig. S9 for GLM-2. For Tables of activations, see Tables S1-S3 in SI for GLM-1 and Tables S7-S9 for GLM-2. In a separate, independent region-of-interest (ROI) analysis on vmPFC and ventral striatum (Fig. 4BC; see Independent regions-of-interest (ROIs) analysis in Methods for details), we further compared the effect of both P<sub>t</sub> and ∆P<sub>t</sub> between GLM-1 and GLM-2. For P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.72, p = 0.47 in vmPFC, t(58) = −0.21, p = 0.83 in ventral striatum), while the effect of P<sub>t</sub> from GLM-1 (one sample t-test, t(29) = −3,82, p <.01 in vmPFC; t(29) = −3.06, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.69, p =.01 in vmPFC; t(29) = −2.50, p .02 in ventral striatum). For ∆P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.07, p =0.94 in vmPFC; t(58) = −0.14, p =0.88 in ventral striatum), while the effect of  from GLM-1 (one-sample t-test, t(29) = −3.12, p <.01 in vmPFC; t(29) = −4.14, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.92, p <.01 in vmPFC; t(29) = −3.59, p <.01 in ventral striatum). For the intertemporal prior, activity in both vmPFC and ventral striatum did not correlate significantly with the intertemporal prior (one-sample t-test, t(29) = −0.07, p =0.95 in vmPFC; t(29) = −0.53, p =0.60 in ventral striatum). All the t-tests described above were two-tailed. Taken together, these results suggest that vmPFC and ventral striatum represented P<sub>t</sub> and ∆P<sub>t</sub> regardless of whether the intertemporal prior and other task-related regressors contributing to P<sub>t</sub> were included in the GLM. We also did not find that vmPFC and ventral striatum to represent the intertemporal prior. In the second analysis, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, 1n (P<sub>t</sub>/(1 - P<sub>t</sub>)) (Fig. S10 in SI). In the third analysis, we implemented a GLM that examined P<sub>t</sub> separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S11 in SI). Each of these analyses showed significant correlation with P<sub>t</sub> in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      As suggested by the reviewer, we added slice-by-slice images showing the effect of Pt and delta Pt (Figure S9 in SI for GLM-2 and Figure S7 for GLM-1). The clusters in blue represent Pt effect, the clusters in orange represent delta Pt effect. As can be seen, both Pt and delta Pt are represented in the vmPFC and ventral striatum.

    1. eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with convincing strength of support. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets.

    2. Reviewer #3 (Public review):

      Summary

      In this manuscript, Zhang et al. investigate the conduction and inhibition mechanisms of the Kv2.1 channel, with a particular focus on the distinct effects of TEA and RY785 on Kv2 potassium channels. Using microsecond-scale molecular dynamics simulations, the authors characterize K⁺ ion permeation and RY785-mediated inhibition within the central pore. Their results reveal an inhibition mechanism that differs from those described for other Kv channel inhibitors.

      Strengths

      The study identifies a distinctive inhibitory mode for RY785, which binds along the channel walls in the open-state structure while still permitting a reduced level of K⁺ conduction. In addition, the authors propose a long-range allosteric coupling between RY785 binding in the central pore and changes in the structural dynamics of Kv2.1. Overall, this is a well-organized and carefully executed study, employing robust simulation and analysis methodologies. The work provides novel mechanistic insights into voltage-gated potassium channel inhibition and may offer useful guidance for future structure-based drug design efforts.

      Weaknesses:

      As noted in the Discussion, this study focuses primarily on the major binding site within the central pore and was not designed to systematically assess other potential allosteric binding sites for RY785. A more comprehensive structural and biophysical evaluation of possible additional binding sites would be a valuable direction for future investigations.

      Comments on revisions:

      The authors have addressed my comments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors were seeking to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, the authors sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). The authors used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. They observed that while TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple non-polar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. They confirmed this mechanism using an additional set of simulations and used it to explain experimental electrophysiology data,

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The authors develop their own forcefield parameters for the RY785 molecule based on extensive QM based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the single channel conductance. The authors have performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The authors conclude that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits K+ current. This conclusion is plausible given that RY785 makes stable contacts with multiple hydrophobic residues in the S6 helix, which they can also validate using a recently published closed-state Kv2.1 channel cryo-EM structure. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The authors, however, did not directly observe this semi-closed channel conformation and in fact acknowledge that more direct simulation evidence would require extensive enhanced-sampling simulations beyond the scope of this study. They have not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the authors quantified K+ permeation, they have not made any estimates of the ligand binding affinities or rates, which could have been potentially compared to experiment and used to validate their models.

      However, despite those relatively minor weaknesses, the conclusions of the study are convincing, and overall this is a solid study helping us to understand two distinct molecular mechanisms of the voltage-gated potassium channel Kv2.1 inhibition by TEA and RY785, respectively.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, Zhang et al. investigate the conduction and inhibition mechanisms of the Kv2.1 channel, with a particular focus on the distinct effects of TEA and RY785 on Kv2 potassium channels. Using microsecond-scale molecular dynamics simulations, the authors characterize K⁺ ion permeation and RY785-mediated inhibition within the central pore. Their results reveal an inhibition mechanism that differs from those described for other Kv channel inhibitors.

      Strengths

      The study identifies a distinctive inhibitory mode for RY785, which binds along the channel walls in the open-state structure while still permitting a reduced level of K⁺ conduction. In addition, the authors propose a long-range allosteric coupling between RY785 binding in the central pore and changes in the structural dynamics of Kv2.1. Overall, this is a well-organized and carefully executed study, employing robust simulation and analysis methodologies. The work provides novel mechanistic insights into voltage-gated potassium channel inhibition and may offer useful guidance for future structure-based drug design efforts.

      Weaknesses:

      The study needs to consider the possibility of multiple binding sites for PY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-023-39307-6). These advances illustrate that ligand effects cannot always be interpreted based solely on a single binding site identified previously.

      Reviewing Editor: 

      The comments of the reviewers seem thoughtful and constructive. The weaknesses noted in reviews mainly concern mismatch between expectations, created by reading the Abstract, and data in the manuscript. The mismatch could be reconciled by either new simulations examining a semi-open state of the gate and additional RY785 binding sites, or by adjusting wording of the Abstract and Discussion to make it more clear that such simulations were not done. 

      The Abstract and Discussion have been revised to make clear the computer-simulations presented in our study were designed to specifically validate or refute the hypothesis that RY785 is recognized by the pore domain, not the voltage sensors. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      The authors addressed all the major issues in the original submission identified by the reviewers. I noticed a few minor issues, listed below, which can potentially fix small errors and further improve the readability of the manuscript. 

      p.3 tetramethyl-ammonium -> tetraethylammonium 

      p.7 "Snapshot of the final snapshot" -> "Snapshot of the final simulation coordinates" 

      p. 8 "sigma value" - please spell out what it is. 

      p. 9 "one or other subunit of the tetramer" -> "one or another subunit of the tetramer" or "one or more subunits of the tetramer" 

      p 15 "(the net charge of these constructs is thus zero)." -> ""(the net charge of these constructs is zero for these systems)." Please note that using ionizable amino acid residues in their default protonation state does not guarantee net zero charge of the system since the number of cationic and anionic residues is generally not the same. 

      p. 15 "Two K+ ions were initially positioned in the selectivity filter, one coordinated by residues 373..." Please indicate at which ion binding sites S_1, S_2, e.g. K+ were located and what the residue names are . 

      SI Figs. S3-S20. Please indicate in the figure captions that all those data are for RY785 

      SI Fig. S22 and SI Table S1 captions "shown in Fig. S20" -> "shown in Fig. S21" 

      We thank the Reviewer for this thorough proofreading. We have made the necessary corrections. 

      Reviewer #2 (Recommendations for the authors): 

      The authors have addressed most of my comments satisfactorily, with the exception of the first point. Below, I provide further clarification regarding my concern. 

      First, it appears that the authors may have misunderstood what is meant by the possibility of multiple binding sites for RY785. This does not imply that the central pore is excluded as a binding site. Rather, it refers to the possibility that, in addition to a pore-domain site, the ligand may interact with additional binding sites, either simultaneously or in a statedependent manner. Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-02339307-6). These advances illustrate that ligand ecects cannot always be interpreted based solely on a single binding site identified previously. Therefore, even if one assumes that there is no precedent for a small-molecule inhibitor that simultaneously acts on both the voltage sensor and pore domain, this does not exclude the possibility that a ligand may bind to both regions in dicerent functional states.  

      The Reviewer’s opinion came across clearly in the previous version. We however disagree that a computational investigation of the possibility that RY785 binds to the voltagesensors is well-advised at this point, given that the model we propose seemingly ocers a rationale for the inhibitory ecects observed experimentally. Our opinion is also that there is no compelling precedent for the mechanism of inhibition envisaged by the Reviewer – and would argue that neither of the two studies referenced above are compelling examples.  As we stated in our previous response to the Reviewer, we believe that the logical next step in this research will be to validate or refute the computational prediction we have put forward, experimentally. 

      In addition, the present computational study does not provide direct mechanistic evidence to explain the statement that RY785 accelerates voltage-sensor deactivation. Specifically, no simulations were performed to model pore-domain closure or voltage-sensor motion upon RY785 binding. Moreover, alternative binding sites were neither explored nor explicitly excluded, as the simulations only involved placing a single molecule of TEA or RY785 approximately 10 Å below the cytoplasmic gate. Under these conditions, conclusions regarding ecects on voltage-sensor dynamics remain speculative. 

      That is a fair characterization. 

      These concerns do not detract from the overall quality of this otherwise strong computational study. There are several straightforward ways to address this issue. For example: 

      (1) Perform molecular docking or related screening approaches to evaluate potential ligand-binding sites beyond the central pore, particularly in regions proximal to the voltage sensor. This should not impose a substantial additional computational burden for a computational chemistry group. 

      (2) Revise the abstract and discussion to clarify that the current work focuses exclusively on pore-domain binding and does not explore possible additional binding sites near the voltage sensor. Explicitly stating this limitation would help prevent potential overinterpretation by readers.

      We have opted for (2), as noted above.

    1. eLife Assessment

      In this manuscript, based on electron microscopy observations of C. elegans embryos, the authors make the bold claim that the plasma membrane ruptures during cell division and that closure of this opening by membrane extension contributes to cytokinesis. Although the findings are potentially valuable, the evidence in support of the authors' claims is inadequate.

    1. eLife Assessment

      This manuscript uncovers the importance of Vinculin in the maintenance of junctional integrity during neural tube closure in regions of increased mechanical stress, by using sophisticated methods such as laser ablation and live imaging. The manuscript also reports a novel application of an established embryonic stem cell protocol to efficiently generate mutant and transgenic embryos for analysis. The findings are fundamental in nature, significantly improve our understanding of a major research question, and are backed by compelling evidence. Whilst there is much to appreciate in this work, exactly how Vinculin mediates neural fold elevation remains unclear, and addressing this lacuna will significantly improve the strength of the manuscript; in addition, some rewriting for better clarity (including technical/methodological details) and inclusion of possible consequences of the increased number of tight junction gaps in the vinculin mutant would be pertinent.

    2. Reviewer #1 (Public review):

      Summary:

      In many vertebrates, the neural tube closes by folding, elevation, and fusion of bilateral neural folds. Loss of the actin-binding protein Vinculin causes failed cranial neural tube closure in mice and is associated with neural tube defects in human patients, but it was not known how Vinculin contributes to neural tube closure. Here, Prudhomme and colleagues find that neural fold elevation and the apical constriction that drives it initiate normally in Vinculin-deficient mouse embryos, but both arrest before the neural folds fuse. The time of failure coincides with increased mechanical tension within the cranial neural plate. They find that Vinculin localizes to areas of high mechanical stress in the WT neural plate, including multi-cellular junctions and dividing cells, and in the absence of Vinculin, recruitment of Myosin and Apical junction proteins is reduced at these sites. These data support a model in which Vinculin recruits junctional proteins to high-stress areas to maintain junctional integrity during neural tube closure.

      Strengths:

      The data presented are thorough, rigorous, and convincing. The combination of live imaging and transgenic fluorescent reporters enables direct observation of junctional behaviors within the mouse cranial neural plate and detailed analysis of how these behaviors are disrupted upon loss of Vinculin. The authors make good use of an ESC transplant approach to efficiently generate mutant and transgenic embryos for analysis.

      Weaknesses:

      Although the loss of junctional integrity, especially at multi-cellular junctions, is clearly and convincingly demonstrated in Vinculin-deficient embryos, it is not clear precisely how this disrupts the elevation of the neural folds to cause exencephaly.

    3. Reviewer #2 (Public review):

      Summary

      Using mouse embryos early in development, this excellent paper from Prudhomme et al. shows that Vinculin's recruitment to adherens junctions during mammalian cranial neural tube closure is essential for maintaining junctional integrity in response to increased tension during this process. Previous work had shown that during neural tube elevation, planar polarity of Myosin II and mechanical forces in the tissue are increased. Additionally, mouse embryos lacking Vinculin were known to display neural tube closure failure, and mutations in human Vinculin had been associated with increased risk of neural tube defects, but the mechanism remained unclear. Here, the authors utilize a high-throughput embryonic stem cell (ESC)-based pipeline to generate Vinculin-depleted embryos, complemented by a conditional mutant lacking Vinculin in the embryonic lineages, to investigate this question. The authors show that Vinculin is not required for force generation, but Vinculin is recruited to cell-cell junctions in a tension-dependent manner and is needed to transmit actomyosin-mediated tension to junctions - particularly tricellular and higher-order multicellular junctions - so that apical constriction can happen during neural fold elevation. Furthermore, they find that Vinculin is required to maintain adhesion during high force events (e.g., rosette resolution and cell division) during neural tube closure. The research builds on previous studies about Vinculin's role in mechanotransduction at cell-cell junctions carried out in cultured epithelial cells, zebrafish cardiomyocytes, or early Xenopus embryos, and investigates how physiological forces required for mouse neural tube closure challenge junction integrity and the important role that Vinculin plays in maintenance of junction integrity and translation of mechanical forces into changes in tissue structure during this process.

      Strengths:

      This study stands out for its sophisticated use of laser ablation and live imaging in neurulating mouse embryos, enabling quantification of junctional tension, Vinculin recruitment to multicellular junctions, and assessment of junction integrity during neural tube elevation. The authors' use of both ESC-derived Vinculin mutant embryos complemented by a second conditional mutant of Vinculin convincingly demonstrates that their findings are specific to the loss of Vinculin. Additionally, the authors demonstrated proof-of-principle for their ESC-based pipeline with a Shroom3 mutant known to be important for neural tube closure. The Zallen lab's application of the genetically engineered ESC-derived mouse embryo pipeline to efficiently generate larger numbers of mutant mouse embryos exhibiting neural tube closure defects (compared with traditional genetic crossing strategies) that can be utilized for live imaging and mechanical perturbations like laser ablation will be valuable for future work in the field. The authors show that Vinculin depletion disrupts tricellular and multicellular junctions. Notably, over 75% of higher-order (5+) vertices in Vinculin mutant embryos display gaps, but interestingly, about one third of 5+ cell junctions in Control embryos also display gaps, indicating that transient vertex remodeling events are needed for normal neural tube closure. Overall, this is a well-written paper that places the authors' findings within the context of prior literature; their beautiful data that is robustly analyzed and clear figure presentation will make the authors' exciting findings accessible to readers.

      Weaknesses:

      The criteria for selection of junctions targeted by laser ablation, including specifics of location, Myosin II intensity, and initial junction length, should be more clearly described in the Methods, especially given the use of different reporter strains (MyoIIB-GFP vs. GFP-Plekha7) across figures, which may influence junction selection for laser ablation. Analysis of Myosin II in Vinculin mutant embryos would benefit from staining for active Myosin II (pMRLC), and further examination of actomyosin organization at different stages of neural fold elevation in controls vs. Vinculin mutants would be informative. Although the authors note that ZO-1 gaps are limited to a subset of vertices where adherens junction gaps are detected, the increased frequency of tight junction gaps in Vinculin mutants could have functional significance that should be noted. Finally, inclusion of schematics to detail how the adherens and tight junction gaps were defined and measured at cell vertices, as well as how cell division completion was defined, would improve transparency and strengthen readers' understanding of how the data were quantified.

    4. Reviewer #3 (Public review):

      Summary:

      Prudhomme et al report a detailed analysis of the role of vinculin in maintaining neuroepithelial integrity during cranial neurulation.

      Strengths:

      The authors use complementary experiments involving super-resolution microscopy, laser ablation, and live imaging of conditional knockout and ESC-derived embryos to demonstrate that loss of vinculin produces wide gaps between the adherens junctions of neuroepithelial cells at later stages of cranial neural fold elevation. The data presented are of extremely high quality, logically presented in a compelling story, and represent a very substantial contribution.

      Weaknesses:

      The authors are invited to consider the largely minor questions recommended below.

      (1) The laser ablations reported are a correlate of cell border, or 'junctional' tension. Please avoid broad statements such as 'mechanical forces are upregulated' (abstract), which invoke gene-like regulation of tissue-level forces (in Newtons). Changes in junctional tension are likely to relate to changes in force generated, but their relationship is not simple: higher tensile stress withstood by the shorter length of junctions in cells with smaller apical surfaces does not necessarily translate into greater force being produced by that cell. The junctional tension readout measured is perfectly relevant to the paper, more so than tissue-level forces would have been.

      (2) What is the mechanical mechanism by which loss of vinculin prevents neural fold elevation? The authors present exciting findings about the cellular consequences of losing Vcl at the late elevation stages when the tissue is quantifiably dysmorphic. A clear argument of how Vcl loss could lead to this dysmorphology would strengthen the paper, particularly given that junctional tension defects are excluded and apical non-constriction at the late stage is only mild.

      (3) Can the authors comment on the likely impacts of Vcl deletion on the basal domain of the cell? For example, they could cite live-imaging of distinct behaviours in Williams et al Dev Cell 2014, and the NTD phenotypes of some integrin/focal adhesion mutant mice.

      (4) The apparent uncoupling of apical area (larger in Vcl KO) from junctional tension (equivalent) in this model is noteworthy. Can the authors speculate on its potential basis?

      (5) Live imaging in Figure 7C appears to show a marked reduction in apical area before cleavage furrow formation (T0-18min), suggesting a large apical constriction event (post-mitotic?), as previously reported (e.g., Ampartzidis et al Dev Biol 2023). Do junctional gaps appear during these constrictions?

      (6) The live imaging setup used is clearly sufficient to identify differences between genotypes, so this is only a minor point. The gassing conditions listed in the methods specify 5% CO2, but E8.5 embryos also need low O2 to complete cranial closure. Was the O2 level controlled? Was tissue-level shape change observed to be consistent with ongoing neurulation during live-imaging?

      (7) Neither the multi-cell laser ablations in the pre-print by De La O cited here, nor the narrower junctional ablations in Bocanegra-Moreno et al., Nat Phys, (2023), identified differences in recoil between developmental stages. Why might those results be different from the findings reported here (e.g., analysis region - not specified in the latter paper)? Limitations to interpreting junctional ablations between cells with different junction lengths include more of the recoil being dissipated by retraction of the longer ablated border.

      (8) Is a truncated Vcl expressed in the ESC model, which could bind catenin without an F-actin anchor? The very high-contrast western shown is cropped so it is not clear whether the catenin-binding N-terminus is present. Does the antibody used recognise the head domain (this reviewer could not readily find the information)?

  2. Apr 2026
    1. Reviewer #1 (Public review):

      Summary:

      Using electron microscopy, the authors report discontinuities in the plasma membrane of C. elegans embryos. They associate these discontinuities with cell division and speculate that membrane rupture and subsequent resealing contribute to cytokinesis. They further discuss the proximity of these sites to vesicles and propose a role for vesicle-mediated membrane extension.

      Weaknesses:

      (1) The possibility that the membrane discontinuity is an artifact

      Although the authors focus on discontinuities in the plasma membrane, similar discontinuities are also observed in mitochondria, the nuclear envelope, and yolk granules. This raises concerns about whether the electron micrographs presented are suitable for assessing membrane continuity.

      Electron micrographs result from a lengthy sample preparation process, including high-pressure freezing, freeze substitution in acetone containing OsO4, gradual warming, uranyl acetate staining, resin embedding, and ultrathin sectioning. In general, lipids are soluble in acetone at temperatures above −30 {degree sign}C, and preservation of membrane structures relies heavily on efficient OsO4 fixation. Insufficient OsO4 treatment would be expected to reduce membrane contrast.

      C. elegans embryos are encapsulated by an eggshell that forms at fertilization and gradually develops during the first few cell divisions. It is unclear how efficiently OsO4 in acetone penetrates the eggshell during freeze substitution, raising further concern about plasma membrane preservation under the conditions used.

      (2) Lack of evidence linking membrane discontinuity to cell division

      The reported plasma membrane discontinuities are not specific to mitotic cells. If this were a physiological process playing an important role in cytokinesis, it should occur in a temporally and spatially coordinated manner with nuclear division. However, it remains unclear at what stage of the cell cycle the membrane rupture occurs and where it is located relative to chromosomes and the mitotic spindle.

      (3) Lack of evidence for extension of the separated membrane

      Although the authors speculate that resealing of the ruptured membrane occurs via extension of the separated membrane, no direct evidence supporting this mechanism is presented. Proximity to vesicles alone does not demonstrate that membrane extension occurs through vesicle fusion. More direct evidence is required to support this claim.

      (4) Inconsistency with published work

      Numerous studies have examined cell division in developing C. elegans embryos using the GFP::PH(PLC1δ1) marker expressed from the ltIs38 transgene [pAA1; pie-1::GFP::PH(PLC1δ1) + unc-119(+)], generated by the Oegema lab (https://wormbase.org/species/c_elegans/transgene/WBTransgene00000911#01--10 ). To date, no study has reported membrane ruptures of the magnitude described here. The complexity of cell surface morphology from the 8- to 12-cell stages onward has been well documented, for example, by Fu et al. (2016) using light-sheet microscopy and 3D reconstruction (doi:10.1038/ncomms11088).

      Supplementary Movies 5, 6, and 10 of this paper illustrate how single-plane images can easily produce apparent membrane discontinuities, for example, due to membrane orientations nearly parallel to the imaging plane.

      The three single-plane images from only three embryos presented in Figure 6 are insufficient to support the authors' strong conclusions. Raw 3D data should be provided.

    2. Reviewer #2 (Public review):

      Summary:

      Liang et al. explore an unusual observation of membrane discontinuities in dividing C. elegans embryonic cells. This report is the first to demonstrate that, instead of the classical invagination of membranes during cytokinesis, cells in the early embryos of C. elegans exhibit separation of sister membranes that extend independently. TEM images of high-pressure-frozen samples provide strong evidence for the presence of Membrane Openings (MOs) in cells at various stages of the cell cycle, predominantly during mitosis. High-resolution images (x 30,000) clearly show the wrinkled plasma membrane and smooth MOs.<br /> The electron microscopy data are supported by the live cell imaging of strains with fluorescently tagged membrane markers. This study opens up the possibility of tracking MOs at other stages of C. elegans development, and also asks if it might be a common phenomenon in other species that exhibit rapid embryonic growth and divisions.

      Strengths:

      (1) Thorough verification of Membrane Openings (MO) by several methods:

      (a) 4 independent sample batches.

      (b) Examined historical collections.

      (c) Analysed embryos at different stages of development. The absence of MOs in later stages (comma) serves as a negative control and gives confidence that MOs are genuine and not technical artifacts.

      (2) Live cell imaging of strain with fluorescently labelled membranes provides real-time dynamics of membrane rupture.

      (3) After observing the membrane rupture, the next obvious question is - what prevents the cytosol from leaking out? The EM images showing PBL and PEL - extracellular matrix serving as barriers for the cytosol are convincing.

      Weakness:

      (1) The association of membrane discontinuities with cell division is not convincing, as there are 159 cells out of 425 showing MOs, but it is not mentioned clearly how many of these are undergoing cell division. Also, it's not clear whether the 20 dividing cells analysed for MOs are a part of the 159 cells or a separate dataset. A graphical representation of the number of samples and observed frequencies would be helpful to understand the data collection workflow.

      (2) In Figures 3A and 3B, the resolution of the images is not enough to verify 3A as classical membrane invagination and 3B as detached sister membranes.

      (3) Figure 6 lacks controls. How does the classical invagination look in this strain? Also, adding nuclear dye would be informative, in order to correlate the nuclear division with membrane rupture, as claimed.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors challenge a dogma in cell biology, namely that cells are at any time point engulfed by a continuous plasma membrane. Liang et al. find that during C elegans embryogenesis, a high number of cells are not entirely surrounded by a plasma membrane but show membrane openings (MOs). These openings are enriched at the embryo's periphery, towards the eggshell. The authors propose that plasma membrane discontinuities emerge during metaphase of mitosis and that independent extension of "sister membranes" engulfs the daughter cells.

      Strengths:

      On the positive side, the authors find plasma membrane discontinuities not only by electron microscopy but also by fluorescence microscopy and provide information about the dynamics of membrane openings and their emergence. While this is assuring, the authors conclude that MOs emerge during metaphase. From what the authors show, this particular information cannot be deduced, as there is no dynamic capture of a membrane scission event together with a chromatin marker that would indicate mitosis. The authors could, however, attempt to find such events in live movies, given the high incidence of MOs reported from their EM data.

      Weaknesses:

      In order to convincingly demonstrate the absence of any plasma membrane in the respective regions of the embryonic periphery or between cells of the embryo, the authors would have to show consecutive serial TEM sections where MOs are detected over more z-planes, beyond the mere 3D reconstructions. Although the authors state in the methods section that continuous ultrathin sections were cut for the metaphase sample (page 21, line 472), consecutive sections are never shown in TEM. While we do see the 3D reconstructions, better documentation of the underlying TEM data is missing. It would be necessary to show a membrane opening in consecutive z sections. Alternatively, the authors could seek the possibility to convincingly back up their claims with volume imaging by focused ion beam scanning EM (FIBSEM), where cellular volumes can be sectioned in almost isotropic resolution.

      Another critical issue concerns the detection of the membrane discontinuities in electron micrographs, which, in my opinion, is ambiguous. How do the authors reliably discriminate in their TEM images whether there is a plasma membrane or not? The absence - or weak appearance - of the stain of the electron dense material at membranes, which seems to be their criterion for MOs, is also apparent at other, intracellular membranes, like at the NE or at the ER (for example, see Figure 1C). Also, the plasma membrane itself appears unevenly stained in regions that the authors delineate as intact (for example, Figure 1C, 2B/1).

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using electron microscopy, the authors report discontinuities in the plasma membrane of C. elegans embryos. They associate these discontinuities with cell division and speculate that membrane rupture and subsequent resealing contribute to cytokinesis. They further discuss the proximity of these sites to vesicles and propose a role for vesicle-mediated membrane extension. 

      Weaknesses:

      (1) The possibility that the membrane discontinuity is an artifact

      Although the authors focus on discontinuities in the plasma membrane, similar discontinuities are also observed in mitochondria, the nuclear envelope, and yolk granules. This raises concerns about whether the electron micrographs presented are suitable for assessing membrane continuity.

      Electron micrographs result from a lengthy sample preparation process, including high-pressure freezing, freeze substitution in acetone containing OsO4, gradual warming, uranyl acetate staining, resin embedding, and ultrathin sectioning. In general, lipids are soluble in acetone at temperatures above −30 {degree sign}C, and preservation of membrane structures relies heavily on efficient OsO4 fixation.

      Insufficient OsO4 treatment would be expected to reduce membrane contrast.

      C. elegans embryos are encapsulated by an eggshell that forms at fertilization and gradually develops during the first few cell divisions. It is unclear how efficiently OsO4 in acetone penetrates the eggshell during freeze substitution, raising further concern about plasma membrane preservation under the conditions used.

      We thank the reviewer for raising this important technical concern. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts. Below, we present multiple lines of evidence—ranging from technical reproducibility to orthogonal imaging approaches—that collectively demonstrate the biological reality of these structures.

      (1) Technical expertise and standard protocols

      Our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well-established. Thus, the observed membrane discontinuities are unlikely to stem from technical inexperience or idiosyncratic methods.

      In addition to membrane discontinuities, we would like to emphasize that a large number of single plasma membranes separating adjacent cytoplasmic domains were also detected under EM (Figure 1, 3 and 4, for instance). This observation is particularly significant because the invagination model cannot generate single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes could explain the formation of cytoplasm-enclosed membranes. Furthermore, as the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, this indicates successful EM processing and argues against inefficient fixation or other technical issues.

      (2) Reproducibility across independent preparations and techniques

      To test whether the discontinuities were preparation-specific, we examined four independent sample batches collected in the lab over the years. Membrane discontinuities, as well as cytoplasm-immersed membranes, on embryonic cells were consistently observed across all batches, indicating that the phenomenon is not dependent on a single preparation method. Furthermore, we validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dualbeam scanning electron microscopy (SEM). Membrane discontinuities were clearly identifiable with both techniques, further supporting their robustness.

      (3) Validation using an independent public dataset

      We examined the publicly available C. elegans embryo EM collection (WormAtlas). In several instances, particularly at the embryonic periphery where plasma membrane discontinuities are more readily visualized (https://www.wormimage.org/image.php?id=140265&page=1), we identified similar structures. The presence of these features in an independent dataset generated by different researchers confirms that they are not artifacts unique to our sample preparation.

      (4) Developmental regulation of membrane discontinuities

      We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (5) Rigorous criteria for identifying membrane discontinuities

      To ensure unbiased analysis, we systematically collected images from early embryonic cells using the following criteria:

      (1) Random section selection: For each sample, we randomly selected one section containing the largest number of embryos or cells (Sup Figure 2) for initial analysis. We found membrane discontinuities in 159 cells distributed across 57 embryos, representing 95% of the total sampled embryos This portion of the data is summarized in Figure 1.

      (2) Whole-membrane examination: Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation.

      (3) Neighboring section verification: Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      (4) Serial section reconstruction: To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments.

      First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Y-shaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (6) Orthogonal validation by live imaging: In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both membrane ruptures and free-ended sister membrane structures could be detected (Figures 6), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase or metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      (2) Lack of evidence linking membrane discontinuity to cell division 

      The reported plasma membrane discontinuities are not specific to mitotic cells. If this were a physiological process playing an important role in cytokinesis, it should occur in a temporally and spatially coordinated manner with nuclear division. However, it remains unclear at what stage of the cell cycle the membrane rupture occurs and where it is located relative to chromosomes and the mitotic spindle.

      Thank you for this insightful comment. We agree that establishing a direct link between plasma membrane discontinuities and mitotic progression is critical, and we appreciate the opportunity to clarify this point.

      In C. elegans embryos, the early stages of development are characterized by rapid and extensive cell division. Within approximately 100 minutes, a two-cell embryo develops into an embryo containing nearly 30 cells. The majority of the electron microscopy analyses in our study were performed on embryos at stages with fewer than 30 cells, where most cells are actively dividing. Thus, it is reasonable to infer that the cells exhibiting membrane discontinuities are predominantly mitotic cells.

      Supporting this notion, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of membrane discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity strongly suggests that membrane discontinuities are tightly linked to cell division.

      Importantly, mitotic features such as metaphase chromosomes aligned at the equatorial plane or two (or more) nuclei sharing common cytoplasm can be identified in EM images. In our single random EM section analysis, we captured membrane discontinuities in cells at metaphase, anaphase (characterized by fewer than 10 chromosomal clumps), and telophase (defined by two nuclei sharing cytoplasm). Hence, membrane discontinuities are indeed present on mitotic cells. In addition, a published work by Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy captured similar membrane discontinuities in cells displaying classical mitotic features, including anaphase or telophase.

      To further investigate the spatial relationship between membrane ruptures and chromosome organization, we performed three-dimensional reconstructions on a metaphase cell. As shown in Figure 2 and Video S1, the membrane discontinuities largely encircled the condensed chromosome disc and were spatially aligned with the future segregation zone, further revealing the relative location of membrane discontinuities to chromosomes, at least at metaphase.

      We further collected 3D information for a telophase cell containing three nuclei. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site that merged to form a single rupture. The observation that membrane ruptures are present in a tri-nucleated cell is particularly informative. The tri-nucleated feature indicates that this cell underwent two rounds of cell division and that both divisions were at telophase. The presence of a single membrane rupture suggests that membrane discontinuities may persist throughout the cell cycle, as the second cell cycle began from a mother cell that still shared cytoplasm with its sister cell and already had one membrane rupture. Therefore, in addition to the mitotic phase, membrane discontinuities—at least in this context—also exist during the DNA synthesis stage.

      (3) Lack of evidence for extension of the separated membrane 

      Although the authors speculate that resealing of the ruptured membrane occurs via extension of the separated membrane, no direct evidence supporting this mechanism is presented. Proximity to vesicles alone does not demonstrate that membrane extension occurs through vesicle fusion. More direct evidence is required to support this claim.

      Thank you for raising this important point. We appreciate the opportunity to clarify our conclusion.

      In our study, EM analysis revealed the presence of cellular vesicles in close proximity to both free membrane edges and the already separated sister plasma membranes (Figure 4). However, we acknowledge that without advanced live-cell imaging, it is not possible to conclusively determine whether the extension of these separated sister membranes occurs through vesicle fusion.

      We realize that a statement in the Discussion section—“The expansion of the plasma membrane is generally driven by vesicle fusion”(page 16)—may have inadvertently led the reviewer to interpret this as our own conclusion regarding the mechanism of membrane extension in this context. In fact, that statement was intended to reflect the current general understanding of membrane expansion, not to imply that we had demonstrated such a mechanism for the free-ended sister membranes. As we subsequently noted, “However, this remains speculative and requires further experimental validation.”

      To avoid any misunderstanding, we will revise this section to clearly state that the mechanism by which the separated sister membranes extend remains unknown and that further investigation is needed to determine how existing models of membrane expansion may apply to or be adapted for this novel context.

      We thank the reviewer again for their thoughtful comment, which has helped us improve the clarity of our manuscript

      (4) Inconsistency with published work

      Numerous studies have examined cell division in developing C. elegans embryos using the GFP::PH(PLC1δ1) marker expressed from the ltIs38 transgene [pAA1; pie-1::GFP::PH(PLC1δ1) + unc-119(+)], generated by the Oegema lab (https://wormbase.org/species/c_elegans/transgene/WBTransgene00000911#01--10 ). To date, no study has reported membrane ruptures of the magnitude described here. The complexity of cell surface morphology from the 8- to 12-cell stages onward has been well documented, for example, by Fu et al. (2016) using light-sheet microscopy and 3D reconstruction (doi:10.1038/ncomms11088).

      Supplementary Movies 5, 6, and 10 of this paper illustrate how single-plane images can easily produce apparent membrane discontinuities, for example, due to membrane orientations nearly parallel to the imaging plane.

      The three single-plane images from only three embryos presented in Figure 6 are insufficient to support the authors' strong conclusions. Raw 3D data should be provided.

      Thank you for this important comment. We fully agree that the GFP::PH(PLC1δ1) marker, generated by the Oegema lab, has been widely and effectively used to study various aspects of C. elegans embryonic development. In fact, we also employed this same marker in our study to assess membrane integrity.

      However, while live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using this same GFP marker have in fact revealed membrane discontinuities that went unnoticed. For example, Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy and 3D reconstruction, captured membrane discontinuities in cells undergoing mitotic phases such as anaphase or telophase. Similarly, an earlier study by Harrell and Goldstein (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Nevertheless, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively identifying such membrane discontinuities.

      We acknowledge that our findings are surprising. We did not set out to challenge the long-held view of membrane integrity during cell division. In fact, this study began when our dedicated EM technician, Jingjing Liang, first observed membrane discontinuity phenomena in control samples—wild-type embryos. Had she not come across this observation, we likely would never have pursued this line of inquiry.

      We appreciate the opportunity to clarify these points and thank the reviewer for thoughtful engagement with our work.

      Reviewer #2 (Public review):

      Summary:

      Liang et al. explore an unusual observation of membrane discontinuities in dividing C. elegans embryonic cells. This report is the first to demonstrate that, instead of the classical invagination of membranes during cytokinesis, cells in the early embryos of C. elegans exhibit separation of sister membranes that extend independently. TEM images of high-pressure-frozen samples provide strong evidence for the presence of Membrane Openings (MOs) in cells at various stages of the cell cycle, predominantly during mitosis. High-resolution images (x 30,000) clearly show the wrinkled plasma membrane and smooth MOs.

      The electron microscopy data are supported by the live cell imaging of strains with fluorescently tagged membrane markers. This study opens up the possibility of tracking MOs at other stages of C. elegans development, and also asks if it might be a common phenomenon in other species that exhibit rapid embryonic growth and divisions. 

      Strengths:

      (1) Thorough verification of Membrane Openings (MO) by several methods: 

      (a) 4 independent sample batches.

      (b) Examined historical collections.

      (c) Analysed embryos at different stages of development. The absence of MOs in later stages (comma) serves as a negative control and gives confidence that MOs are genuine and not technical artifacts. 

      (2) Live cell imaging of strain with fluorescently labelled membranes provides realtime dynamics of membrane rupture.

      (3) After observing the membrane rupture, the next obvious question is - what prevents the cytosol from leaking out? The EM images showing PBL and PEL - extracellular matrix serving as barriers for the cytosol are convincing.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weakness:

      (1) The association of membrane discontinuities with cell division is not convincing, as there are 159 cells out of 425 showing MOs, but it is not mentioned clearly how many of these are undergoing cell division. Also, it's not clear whether the 20 dividing cells analysed for MOs are a part of the 159 cells or a separate dataset. A graphical representation of the number of samples and observed frequencies would be helpful to understand the data collection workflow.

      We sincerely thank the reviewer for raising this important question and appreciate the opportunity to clarify these points.

      (1) Relationship between membrane discontinuities and cell division

      In C. elegans embryos, early development is characterized by rapid and extensive cell division: within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. Most of our electron microscopy (EM) analyses were performed on embryos at stages with fewer than 30 cells, in which the majority of cells are actively dividing. Therefore, it is reasonable to infer that the cells exhibiting membrane discontinuities (MOs) are predominantly mitotic. Supporting this, as embryos reached the comma stage—when cell proliferation declines and elongation begins—the incidence of MOs dropped sharply (0/13, 0/17, and 0/30 cells examined. This developmental specificity strongly links MOs to cell division.

      Moreover, in single random EM sections, we observed MOs in cells displaying clear mitotic features, such as metaphase chromosomes aligned at the equatorial plate, or anaphase/telophase configurations (fewer than 10 chromosomal clumps or two nuclei sharing common cytoplasm). Thus, MOs are indeed present in mitotic cells.

      From our 3D reconstruction (Figure 5), we identified a telophase cell containing three nuclei, each enclosed by its own plasma membrane, with each membrane harboring a single rupture that converged into a single opening. This tri-nucleated configuration indicates that the cell had undergone two rounds of division and was at telophase in both. The presence of a single membrane rupture in this context suggests that MOs can persist beyond mitosis, as the second cell cycle initiated from a mother cell that already shared cytoplasm with its sister and already contained a rupture. Thus, in this case, MOs were also present during DNA synthesis stage.

      (2) Clarification of sample numbers and datasets

      In Figure 1, we present results from a single EM section per embryonic cell, with sections randomly selected per embryo as detailed in Sup Figure 2. This initial dataset (425 cells) forms the basis of Figure 1.

      From the same pool of 425 cells, we used additional EM sections—distinct from those shown in Sup Figure 2—to locate 20 dividing cells for analysis of membrane discontinuities. Thus, while these 20 cells originated from the same set of embryos, they were not derived from the sections used in Figure 1 or Sup Figure 2.

      A graphical summary of sample numbers from the single-section analysis is already provided in Figure 1. Notably, cells with two clearly visible nuclei are more likely to be sectioned through or near their maximal diameter. In contrast, the randomly selected sections used for Figure 1 captured cells at variable planes, reducing the likelihood of observing MOs. Consistent with this, in the three embryos where no MOs were detected (one example is Sup Figure 2N), the sections likely passed through peripheral regions of the cells. Consequently, the frequency of MOs in randomly sectioned cells (Figure 1) is not directly comparable to that observed in the 20 dividing cells, which were analyzed using sections more likely to capture cells near their maximal diameter. These 20 dividing cells should therefore be considered a separate analysis. We will add detailed explanations in the Methods section to ensure this distinction is clearly understood.

      We are grateful for the reviewer’s thoughtful feedback and believe these clarifications will improve the clarity and rigor of the manuscript.

      (2) In Figures 3A and 3B, the resolution of the images is not enough to verify 3A as classical membrane invagination and 3B as detached sister membranes.

      Thank you for your valuable comment. In the revised manuscript, we will provide additional images at higher magnification to better illustrate the classical membrane invagination in Figure 3A and the detached sister membranes in Figure 3B.

      (3) Figure 6 lacks controls. How does the classical invagination look in this strain? Also, adding nuclear dye would be informative, in order to correlate the nuclear division with membrane rupture, as claimed. 

      Thank you for this important comment. As we addressed how we correlated nuclear division with membrane rupture in response to weakness (1), below we will focus on how we may distinguish classical invagination from membrane rupture.

      While live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using GFP::PH or similar markers have in fact revealed membrane discontinuities that went unnoticed. For example, using light-sheet microscopy and 3D reconstruction, Fu et al captured membrane discontinuities in cells undergoing division such as anaphase or telophase (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016.DOI:10.1038/ncomms11088)

      Similarly, an earlier study by Goldstein et al. (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Here, to capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, both membrane ruptures and free-ended sister membrane structures (Figures 6) could be detected, providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos.

      However, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively distinguishing invagination from membrane discontinuities.

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors challenge a dogma in cell biology, namely that cells are at any time point engulfed by a continuous plasma membrane. Liang et al. find that during C elegans embryogenesis, a high number of cells are not entirely surrounded by a plasma membrane but show membrane openings (MOs). These openings are enriched at the embryo's periphery, towards the eggshell. The authors propose that plasma membrane discontinuities emerge during metaphase of mitosis and that independent extension of "sister membranes" engulfs the daughter cells.

      Strengths:

      On the positive side, the authors find plasma membrane discontinuities not only by electron microscopy but also by fluorescence microscopy and provide information about the dynamics of membrane openings and their emergence. While this is assuring, the authors conclude that MOs emerge during metaphase. From what the authors show, this particular information cannot be deduced, as there is no dynamic capture of a membrane scission event together with a chromatin marker that would indicate mitosis. The authors could, however, attempt to find such events in live movies, given the high incidence of MOs reported from their EM data.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weaknesses:

      In order to convincingly demonstrate the absence of any plasma membrane in the respective regions of the embryonic periphery or between cells of the embryo, the authors would have to show consecutive serial TEM sections where MOs are detected over more z-planes, beyond the mere 3D reconstructions. Although the authors state in the methods section that continuous ultrathin sections were cut for the metaphase sample (page 21, line 472), consecutive sections are never shown in TEM. While we do see the 3D reconstructions, better documentation of the underlying TEM data is missing. It would be necessary to show a membrane opening in consecutive z sections. Alternatively, the authors could seek the possibility to convincingly back up their claims with volume imaging by focused ion beam scanning EM (FIBSEM), where cellular volumes can be sectioned in almost isotropic resolution

      We Thank the reviewer for raising these important technical concerns. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts.

      First of all, in addition to membrane discontinuities, we would like to highlight that a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      Second, we applied rigorous criteria for identifying membrane discontinuities:

      (1) To test whether the discontinuities were preparation specific, we examined four independent sample batches and validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dual-beam scanning electron microscopy (SEM).

      (2) We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (3) Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation. Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments using consecutive sections, as the reviewer suggested. First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Yshaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (4) In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both putative membrane ruptures (Figure 6A) and free-ended sister membrane structures could be detected (Figures 6B and 6C), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088). revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase and metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      Another critical issue concerns the detection of the membrane discontinuities in electron micrographs, which, in my opinion, is ambiguous. How do the authors reliably discriminate in their TEM images whether there is a plasma membrane or not? The absence - or weak appearance - of the stain of the electron dense material at membranes, which seems to be their criterion for MOs, is also apparent at other, intracellular membranes, like at the NE or at the ER (for example, see Figure 1C). Also, the plasma membrane itself appears unevenly stained in regions that the authors delineate as intact (for example, Figure 1C, 2B/1).

      We thank the reviewer for raising this important concern.

      First, our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well established. Thus, the observed membrane discontinuities are unlikely to result from technical inexperience or idiosyncratic methods.

      Second, because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Specifically, a membrane discontinuity was confirmed only after inspecting the entirety of the plasma membrane in neighboring sections. We will include this verification protocol in the revised Methods section, and additional images of consecutive sections can be provided if needed.

      Third, in addition to membrane discontinuities, a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      EM-related publications by Jingjing Liang:

      Chen D, Jian Y, Liu X, Zhang Y, Liang J, Qi X, Du H, Zou W, Chen L, Chai Y, Ou G, Miao L, Wang Y, Yang C. 2013. Clathrin and AP2 Are Required for Phagocytic Receptor-Mediated Apoptotic Cell Clearance in Caenorhabditis elegans. PLoS Genetics 9:e1003517. DOI: https://doi.org/10.1371/journal.pgen.1003517

      Ding L, Yang X, Tian H, Liang J, Zhang F, Wang G, Wang Y, Ding M, Shui G, Huang X. 2018. Seipin regulates lipid homeostasis by ensuring calcium‐dependent mitochondrial metabolism. The EMBO Journal 37:e97572. DOI: https://doi.org/10.15252/embj.201797572

      Guan L, Yang Y, Liang J, Miao Y, Shang A, Wang B, Wang Y, Ding M. 2022. ERGIC2 and ERGIC3 regulate the ER‐to‐Golgi transport of gap junction proteins in metazoans. Traffic 23:140–157. DOI: https://doi.org/10.1111/tra.12830

      Li Y, Zhang Y, Gan Q, Xu M, Ding X, Tang G, Liang J, Liu K, Liu X, Wang X, Guo L, Gao Z, Hao X, Yang C. 2018. C . elegans -based screen identifies lysosome-damaging alkaloids that induce STAT3-dependent lysosomal cell death. Protein & Cell 9:1013–1026. DOI: https://doi.org/10.1007/s13238-018-0520-0

      Miao Y, Du Y, Wang B, Liang J, Liang Y, Dang S, Liu J, Li D, He K, Ding M. 2024. Spatiotemporal recruitment of the ubiquitin-specific protease USP8 directs endosome maturation. eLife 13:RP96353. DOI: https://doi.org/10.7554/eLife.96353

      Qin J, Liang J, Ding M. 2014. Perlecan Antagonizes Collagen IV and ADAMTS9/GON-1 in Restricting the Growth of Presynaptic Boutons. Journal of Neuroscience 34:10311–10324. DOI: https://doi.org/10.1523/JNEUROSCI.5128-13.2014

      Wang Z, Zhang L, Zhou B, Liang J, Tian Y, Jiang Z, Tao J, Yin C, Chen S, Zhang W, Zhang J, Wei W. 2026. A single MYB transcription factor GmMYB331 regulates seed oil accumulation and seed size/weight in soybean. Journal of Integrative Plant Biology 68:470– 485. DOI: https://doi.org/10.1111/jipb.70101

      Xu J, Chen S, Wang W, Man Lam S, Xu Y, Zhang S, Pan H, Liang J, Huang Xiahe, Wang Yu, Li T, Jiang Y, Wang Yingchun, Ding M, Shui G, Yang H, Huang Xun. 2022. Hepatic CDP-diacylglycerol synthase 2 deficiency causes mitochondrial dysfunction and promotes rapid progression of NASH and fibrosis. Science Bulletin 67:299–314. DOI: https://doi.org/10.1016/j.scib.2021.10.014

      Xu M, Ding L, Liang J, Yang X, Liu Y, Wang Y, Ding M, Huang X. 2021. NAD kinase sustains lipogenesis and mitochondrial metabolism through fatty acid synthesis. Cell Reports 37:110157. DOI: https://doi.org/10.1016/j.celrep.2021.110157

      Yang L, Liang J, Lam SM, Yavuz A, Shui G, Ding M, Huang X. 2020. Neuronal lipolysis participates in PUFA‐mediated neural function and neurodegeneration. EMBO reports 21:e50214. DOI: https://doi.org/10.15252/embr.202050214

      Yang X, Liang J, Ding L, Li X, Lam S-M, Shui G, Ding M, Huang X. 2019. Phosphatidylserine synthase regulates cellular homeostasis through distinct metabolic mechanisms. PLOS Genetics 15:e1008548. DOI: https://doi.org/10.1371/journal.pgen.1008548

      Zhu J, Lam SM, Yang L, Liang J, Ding M, Shui G, Huang X. 2022. Reduced phosphatidylcholine synthesis suppresses the embryonic lethality of seipin deficiency. Life Metabolism 1:175–189. DOI: https://doi.org/10.1093/lifemeta/loac02

    1. eLife Assessment

      This important study explores how the phase of neural oscillations in the alpha band affects visual perception, indicating that perceptual performance varies due to changes in sensory precision rather than decision bias. The evidence is convincing in its experimental design and analytical approach. This work should interest cognitive neuroscientists who study perception and decision-making.

    2. Reviewer #1 (Public review):

      [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      In their paper entitled "Alpha-Band Phase Modulates Perceptual Sensitivity by Changing Internal Noise and Sensory Tuning," Pilipenko et al. investigate how pre-stimulus alpha phase influences near-threshold visual perception. The authors aim to clarify whether alpha phase primarily shifts the criterion, multiplicatively amplifies signals, or changes the effective variance and tuning of sensory evidence. Six observers completed many thousands of trials in a double-pass Gabor-in-noise detection task while an EEG was recorded. The authors combine signal detection theory, phase-resolved analyses, and reverse correlation to test mechanistic predictions. The experimental design and analysis pipeline provide a clear conceptual scaffold, with SDT-based schematic models that make the empirical results accessible even for readers who are not specialists in classification-image methods.

      Strengths:

      The study presents a coherent and well-executed investigation with several notable strengths. First, the main behavioral and EEG results in Figure 2 demonstrate robust pre-stimulus coupling between alpha phase and d′ across a substantial portion of the pre-stimulus interval, with little evidence that the criterion is modulated to a comparable extent. The inverse phasic relationship between hit and false-alarm rates maps clearly onto the variance-reduction account, and the response-consistency analysis offers an intuitive behavioral complement: when two identical stimuli are both presented at the participant's optimal phase, responses are more consistent than when one or both occur at suboptimal phases. The frontal-occipital phase-difference result suggests a coordinated rather than purely local phase mechanism, supporting the central claim that alpha phase is linked to changes in sensitivity that behave like changes in internal variability rather than simple gain or criterion shifts. Supplementary analyses showing that alpha power has only a limited relationship with d′ and confidence reassure readers that the main effects are genuinely phase-linked rather than a recasting of amplitude differences.

      Second, the reverse-correlation results in Figure 3 extend this story in a satisfying way. The classification images and their Gaussian fits show that at the optimal phase, the weighting of stimulus energy is more sharply concentrated around target-relevant spatial frequencies and orientations, and the bootstrapped parameter distributions indicate that the suboptimal phase is best described by broader tuning and a modest change in gain rather than a pure criterion account. The authors' interpretation that optimal-phase perception reflects both reduced effective internal noise and sharpened sensory tuning is reasonable and well-supported. Overall, the data and figures largely achieve the stated aims, and the work is likely to have an impact both by clarifying the interpretation of alpha-phase effects and by illustrating a useful analytic framework that other groups can adopt.

    3. Reviewer #2 (Public review):

      Summary:

      The study of Pilipenko et al evaluated the role of alpha phase in a visual perception paradigm using the framework of signal detection theory and reverse correlation. Their findings suggest that phase-related modulations in perception are mediated by a reduction in internal noise and a moderate increase in tuning to relevant features of the stimuli in specific phases of the alpha cycle. Interestingly, the alpha phase did not affect the criterion. Criterion was related to modulations in alpha power, in agreement with previous research.

      Strengths:

      The experiment was carefully designed, and the analytical pipeline is original and suited to answer the research question. The authors frame the research question very well and propose several models that account for the possible mechanisms by which the alpha phase can modulate perception. This study can be very valuable for the ongoing discussion about the role of alpha activity in perception.

      Conclusion:

      This study addresses an important and timely question and proposes an original and well-thought-out analytical framework to investigate the role of alpha phase in visual perception. While the experimental design and theoretical motivation are strong, the very limited sample size substantially constrains the strength of the conclusions that can be drawn at the group level.

      Bibliography:

      Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365-376 (2013). https://doi.org/10.1038/nrn3475

      Tamar R Makin, Jean-Jacques Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript eLife 8:e48175 https://doi.org/10.7554/eLife.48175

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper entitled "Alpha-Band Phase Modulates Perceptual Sensitivity by Changing Internal Noise and Sensory Tuning," Pilipenko et al. investigate how pre-stimulus alpha phase influences near-threshold visual perception. The authors aim to clarify whether alpha phase primarily shifts the criterion, multiplicatively amplifies signals, or changes the effective variance and tuning of sensory evidence. Six observers completed many thousands of trials in a double-pass Gabor-in-noise detection task while an EEG was recorded. The authors combine signal detection theory, phase-resolved analyses, and reverse correlation to test mechanistic predictions. The experimental design and analysis pipeline provide a clear conceptual scaffold, with SDT-based schematic models that make the empirical results accessible even for readers who are not specialists in classification-image methods.

      Strengths:

      The study presents a coherent and well-executed investigation with several notable strengths. First, the main behavioral and EEG results in Figure 2 demonstrate robust pre-stimulus coupling between alpha phase and d′ across a substantial portion of the pre-stimulus interval, with little evidence that the criterion is modulated to a comparable extent. The inverse phasic relationship between hit and false-alarm rates maps clearly onto the variance-reduction account, and the response-consistency analysis offers an intuitive behavioral complement: when two identical stimuli are both presented at the participant's optimal phase, responses are more consistent than when one or both occur at suboptimal phases. The frontal-occipital phase-difference result suggests a coordinated rather than purely local phase mechanism, supporting the central claim that alpha phase is linked to changes in sensitivity that behave like changes in internal variability rather than simple gain or criterion shifts. Supplementary analyses showing that alpha power has only a limited relationship with d′ and confidence reassure readers that the main effects are genuinely phase-linked rather than a recasting of amplitude differences.

      Second, the reverse-correlation results in Figure 3 extend this story in a satisfying way. The classification images and their Gaussian fits show that at the optimal phase, the weighting of stimulus energy is more sharply concentrated around target-relevant spatial frequencies and orientations, and the bootstrapped parameter distributions indicate that the suboptimal phase is best described by broader tuning and a modest change in gain rather than a pure criterion account. The authors' interpretation that optimal-phase perception reflects both reduced effective internal noise and sharpened sensory tuning is reasonable and well-supported. Overall, the data and figures largely achieve the stated aims, and the work is likely to have an impact both by clarifying the interpretation of alpha-phase effects and by illustrating a useful analytic framework that other groups can adopt.

      Weaknesses:

      The weaknesses are limited and relate primarily to framing and presentation rather than to the substance of the work. First, because contrast was titrated to maintain moderate performance (d′ between 1.2 and 1.8), the phase-linked changes in sensitivity appear modest in absolute terms, which could benefit from explicit contextualization. Second, a coding error resulted in unequal numbers of double-pass stimulus pairs across participants, which affects the interpretability of the response-consistency results. Third, several methodological details could be stated more explicitly to enhance transparency, including stimulus timing specifications, electrode selection criteria, and the purpose of phase alignment in group averaging. Finally, some mechanistic interpretations in the Discussion could be phrased more conservatively to clearly distinguish between measurement and inference, particularly regarding the relationship between reduced internal noise and sharpened tuning, and the physiological implementation of the frontal-occipital phase relationship.

      We appreciate the reviewer’s thoughtful and constructive feedback, particularly regarding clarity and framing. In response, we have made several revisions to improve transparency and contextualization throughout the manuscript.

      First, we now explicitly contextualize the relatively modest change in sensitivity by adding discussion of the contrast-titration procedure and its implications for effect size interpretation. Second, we address the coding error that led to unequal numbers of double-pass stimulus pairs across participants sooner in the manuscript by reporting the average number of pairs per participant in the Results (as well as the Methods), allowing for readers to interpret the results more appropriately. Third, we have provided additional detail, including precise stimulus timing parameters, electrode selection criteria, and a clearer explanation of the rationale for phase alignment in the Results (in addition to the Methods) section. Finally, we have revised portions of the Discussion to adopt more conservative language when interpreting our results, which more clearly distinguishes between empirical observations and mechanistic inferences, along with offering additional interpretations for the frontal-occipital phase relationship.

      We believe these revisions substantially improve the clarity, transparency, and interpretability of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study of Pilipenko et al evaluated the role of alpha phase in a visual perception paradigm using the framework of signal detection theory and reverse correlation. Their findings suggest that phase-related modulations in perception are mediated by a reduction in internal noise and a moderate increase in tuning to relevant features of the stimuli in specific phases of the alpha cycle. Interestingly, the alpha phase did not affect the criterion. Criterion was related to modulations in alpha power, in agreement with previous research.

      Strengths:

      The experiment was carefully designed, and the analytical pipeline is original and suited to answer the research question. The authors frame the research question very well and propose several models that account for the possible mechanisms by which the alpha phase can modulate perception. This study can be very valuable for the ongoing discussion about the role of alpha activity in perception.

      Weaknesses:

      The sample size collected (N = 6) is, in my opinion, too small for the statistical approach adopted (group level). It is well known that small sample sizes result in an increased likelihood of false positives; even in the case of true positives, effect sizes are inflated (Button et al., 2013; Tamar and Orban de Xivry, 2019), negatively affecting the replicability of the effect.

      Although the experimental design allows for an accurate characterization of the effects at the single-subject level, conclusions are drawn from group-level aggregated measures. With only six subjects, the estimation of between-subject variability is not reliable. The authors need to acknowledge that the sample size is too small; therefore, results should be interpreted with caution.

      Conclusion:

      This study addresses an important and timely question and proposes an original and well-thought-out analytical framework to investigate the role of alpha phase in visual perception. While the experimental design and theoretical motivation are strong, the very limited sample size substantially constrains the strength of the conclusions that can be drawn at the group level.

      Bibliography:

      Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365-376 (2013). https://doi.org/10.1038/nrn3475

      Tamar R Makin, Jean-Jacques Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript eLife 8:e48175 https://doi.org/10.7554/eLife.48175

      We thank the reviewer for their supportive remarks on our design and analysis, and for raising this important statistical concern about our sample size (n=6). Our choice of a small sample size was driven by methodological considerations. Specifically, our reverse correlation analysis requires a large number of trials per participant, as it estimates perceptual tuning by regressing behavioral responses against fluctuations in the energy of stimulus features (orientation and spatial frequency). This approach, as well as the computation of signal detection theory (SDT) metrics such as d′ and criterion, depends on high trial counts to obtain reliable estimates, particularly given that our analysis further subdivides trials across eight phase bins. For this reason, we prioritized collecting a large number of trials per participant (∼5,000), which is consistent with established practices in psychophysical research.

      Importantly, our approach means that our design is reliable on the individual level, which motivated us to include a new binomial probability testing in our revised paper. This binomial test helps address concerns about the generalizability of our results. Binomial testing considers each participant as an independent replication of the effect and then computes the p-value associated with the probability of having observed the given number of statistically significant participants by chance, with a false positive rate of 0.05. In our data, 3 out of 6 participants showed significant effects, which corresponds to a probability of 0.002 of having observed these effects by chance alone. We believe this converging evidence supports the replicability and generalizability of our results. To improve the transparency of the single-subject data, we have included single-participant results in the Supplemental Materials to allow readers to directly assess the consistency of effects across individuals and to better contextualize between-subject variability.

      Thank you again for your suggestions, we believe that these additions have greatly improved our manuscript by demonstrating the robustness of our findings and increasing the transparency of our single-subject results.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The issue of generalizability arose during the review process, as your results are based on a small sample of participants who undertook a very large number of trials. In the revised version, it would be useful to discuss why this approach is valid, especially in the context of linking EEG with modeling (i.e., why it is more powerful than having many participants with fewer trials), and the extent to which your results can generalize to the population.

      We sincerely appreciate all of the helpful comments provided by the reviewers and hope we can address the concerns of our experimental approach. In the introduction, we have emphasized the importance of our current small sample size design, which allows us to reliably compute our signal detection theory metrics across 8 phase bins in addition to including the reverse correlation analysis. In the methods section, we have added a description of the binomial probability statistical framework, which addresses the generalizability of our results. In this framework, each participant is viewed as an independent replication and the p-value reflects the probability of having observed the number of individually significant subjects from the total sample size by chance. In this regard, observing a significant effect in 3 out of 6 participants (as in our study) from chance alone has a 0.002 probability, which we believe is unlikely and instead reflects a true effect present in the general population.

      Below I have copied our changes in the introduction and methods sections.

      “... in a large number of trials (6,020 per observer, n = 6) across multiple EEG sessions. This approach ensures a sufficient number of trials in order to reliably compute signal detection theory (SDT) metrics across multiple alpha phase bins while also affording enough statistical power for reverse correlation analysis (Xue et al., 2024), making it preferred over having a larger sample size with fewer trials.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024).”

      Reviewer #1 (Recommendations for the authors):

      My suggestions are intended to be light-touch and focused on strengthening the clarity and durability of the Reviewed Preprint rather than on additional experimentation or major new analyses.

      (1) Limitation statement for the double-pass coding error:

      Add a short statement in the Methods or Results acknowledging that the coding error led to markedly fewer repeated stimulus pairs for the first three participants than for the last three. For the response-consistency result in Figure 2E, a simple acknowledgement that the available evidence is stronger for some participants than others will help readers calibrate their confidence without detracting from the main story.

      Thank you for this suggestion, we have now added a statement to this effect in the Results section, in addition to the description already mentioned in the Methods section.

      “To examine this, we implemented a double-pass stimulus presentation (~600 stimulus pairs for participants 1-3 and ~2,500 pairs for participants 4-6) and analyzed participant’s response consistency (Xue et al., 2024) to two identical stimuli.”

      (2) Contextualizing the titrated performance level:

      In the Discussion, explicitly note that contrast was titrated to keep d′ between approximately 1.2 and 1.8, which intentionally maintains moderate performance. This contextualization will help readers understand that while the phase-linked changes appear modest in absolute terms, they are mechanistically informative within this design.

      Thank you, we have included a sentence to the Discussions speaking to this point.

      “We also note that the observed modulation of d’ between optimal and suboptimal phases was relatively modest in absolute terms (0.21) in our study and could therefore require many trials per subject to detect. Two reasons for this modest effect size could be related to specific features of our task design. First, we titrated stimulus contrast to maintain consistent task performance. This titration could have reduced the magnitude of the phase effect on d’ that would otherwise be apparent if the stimulus intensity were kept constant. Additionally, the use of (relatively) high-contrast random noise likely means that trial-to-trial variability in perception is largely driven by random fluctuations in the noise properties and, to a lesser extent, internal brain state. Although both of these choices were necessary to perform SDT and reverse correlation analysis, they differ from many previous studies investigating alpha phase using only near-threshold detection in the absence of external noise and may contribute to an underestimation of the true effect size.”

      (3) Methods clarifications:

      (a) Replace placeholder text such as "{plus minus}" and "{degree sign}" with the appropriate symbols, and ensure that any equations implied in the reverse-correlation section are fully present.

      Thank you for bringing this to our attention, these placeholder texts are an artifact of the conversion process and we will correct this.

      (b) State explicitly that the 8 ms stimulus duration corresponds to a single frame on your 120 Hz display, which will clarify the timing in Figure 1A and the pre-stimulus windows in the phase analyses.

      Thank you, we have added language to both the Method and Results sections explicitly indicating that the 8 ms stimulus choice corresponds to a single screen refresh. Additionally, we changed the text in Figure 1A to include inter-trial interval timing (as opposed to merely saying “Start Trial”):

      “(A) Task design. Each trial contained a brief, filtered-noise stimulus (8 ms; one screen refresh) presented to the right or left of fixation with equal probability.”

      “Each participant (n = 6) completed 5-6 EEG sessions of a Yes/No detection paradigm whereby participants reported the presence or absence of a brief (8 ms; one screen refresh) vertical Gabor target (2 cycles per degree) with concurrent confidence judgments (see Figure 1A), along with an additional imagination judgement (reported in the supplemental materials).”

      (c) In the description of the post-stimulus taper, consider phrasing the rationale in terms of minimizing contamination from evoked responses rather than asserting that the taper ends before the earliest evoked response, which keeps the argument correct without committing to a precise latency boundary.

      Thank you for this suggestion. We have changed our rationale for the taper to “minimizing”, rather than avoiding, the evoked response.

      “This resulted in the post-stimulus data being flat after 70 ms, which is intended to minimize the evoked response in our data.”

      (4) Analysis transparency:

      (a) In the description of posterior electrode selection, explicitly note that channels were chosen solely on the basis of alpha power, independent of behavioural performance, and that the same electrodes were used for each participant across sessions.

      We have gladly made this clarification to the methods.

      “This was individually determined by rank-ordering 17 of the posterior channels (Pz, P3, P7, O1, Oz, O2, P4, P8, P1, P5, PO7, PO3, POz, PO4, PO8, P6, and P2) and algorithmically choosing the three with the highest power. This ensured that electrode selection was made independent of performance and instead was based upon maximizing alpha signal strength.”

      (b) Describe the phase-alignment step used to center each participant's optimal bin before group averaging as a device for visualization and summary, and clarify that inferential statistics are based on the underlying, non-aligned data as appropriate. This will reassure readers who are cautious about circularity.

      We agree that this should be made more explicit throughout the manuscript and have added statements clarifying this aspect in the Figure 2B caption, the Results, and Method sections.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only. Error bars represent ± 1 SEM. The pattern shows a clear phasic modulation of d’ across bins.”

      “... requiring us to phase-align the performance data across participants in order to visualize the underlying phasic effects. To this end, we aligned all metrics (d’, c, HR, and FAR) by circularly shifting the data so that the bin with the highest d’ was assigned to bin 8, which was then omitted from further visualizations.”

      “Bin 8 was then omitted from further visualizations. The shifted data were then averaged across all time points from -450 ms to 0 ms, based on significant effects at the group level, and averaged across participants. No statistics were conducted on these shifted variables and instead are for visualization purposes only.”

      (c) Add a short note on the number of permutations and the cluster-forming threshold in the phase-coupling analyses, if not already stated in the Results or captions, to complete the description of your non-parametric testing procedure.

      Thank you, we agree that reiterating this information in the Results section is helpful for the reader to clarify the analysis procedure.

      “After smoothing the resultant vector length over time with a 50 ms moving average, we compared the observed vector lengths to a permuted threshold (95th percentile of 1,000 permutations) at each time point from –700 to 0 ms and performed cluster correction (95th percentile of the permuted cluster size) to account for multiple comparisons.”

      (5) Discussion framing:

      Make one or two small adjustments to your mechanistic phrasing so that the distinctions between measurement and interpretation are fully explicit:

      (a) State that the combination of phase-d′ coupling, counterphased hit and false-alarm rates, response consistency, and phase-dependent classification images is "consistent with" a reduction in effective internal noise and sharper estimated tuning at optimal alpha phase, within the assumptions of your SDT and reverse-correlation framework.

      Thank you for this suggestion. We have changed the language in the discussions to reflect this framing and interpretation of the results.

      “Moreover, our data are consistent with a model in which the variability of internal responses changes systematically across the alpha cycle, as reflected in the inverse relationship between hit rate and false alarm rate.”

      (b) Emphasize that reduced effective internal noise and sharpened sensory tuning are two complementary descriptions of a better match between sensory evidence and decision template rather than fully separable mechanisms.

      Thank you, we have added this language for clarity of our interpretation.

      “Together with decreases in the variance of sensory tuning during the optimal phase, our results suggest that alpha phase impacts sensitivity by shaping trial-to-trial variation in internal noise during perceptual decision making, leading to better matches between sensory evidence and decision templates as opposed to a change in the gain of internal sensory responses.”

      (c) Note that the frontal-occipital phase relationship is consistent with a coordinated, possibly top-down component to the alpha-phase effect, while remaining agnostic about the precise physiological implementation.

      Thank you for raising this additional interpretation. We have added this as a plausible alternative to the single-source account in the Discussion section.

      “Moreover, our results suggest that prior literature reporting phasic effects in the alpha-band range from both frontal and occipital regions may plausibly be reporting the same effect from a single projected dipole source; however, these results are also consistent with two synchronized alpha sources which are anti-phase.”

      Reviewer #2 (Recommendations for the authors):

      Major issues:

      Given that collecting more data may not be doable, the authors should take some actions to test the reliability of their results. For instance, simulations could be run to test the robustness of the results with such a small sample size (Zoefel, 2019). It would also be of interest to include in the report statistics and plots at the individual level, not only the aggregates. It is also important to report which electrodes were used in the analysis for each of the subjects, in the Methods section, it is clearly stated that these electrodes differed between subjects.

      Thank you for these suggestions. To assess the reliability of our results at the single-subject level, we have included a new binomial probability test which is a framework suitable for small sample size experiments with large trail numbers (Schwarzkopf & Huang, 2024). Binomial testing views each individual as an independent replication and considers the probability of having observed the number of significant participants given the total number tested participants, and outputs the probability of having observed the results by chance. We believe this framework adequately addresses the reviewer’s concern of generalizability in addition to being well-suited to the design of our study.

      To assess individual significance, we averaged the resultant vector length and permutations over the analysis window from -450 to 0 ms. If the resultant vector length exceeded the permutation for that participant, then they were considered to be a significant participant. In total, 3 out of 6 participants (participants 1, 4, and 5) showed significant d’ coupling. The binomial probability (equivalent to a p-value) of having observed this outcome as a result of three false positives at the individual-subject level is very small (p = 0.002), which is sufficiently low for psychological studies.

      Below is the text which we have added to the Results and Methods sections.

      “To interrogate the robustness of our findings at the single-subject level, we adopted a test of binomial probability, which is a statistical framework that treats each individual as an independent replication and is ideal for small sample size studies that utilize a large number of trials per observer (Schwarzkopf & Huang, 2024). For our data, we assessed individual significance by averaging the actual and permuted resultant vector lengths across time (-450 to 0ms) and comparing the real vector length to the 95% percentile of the permuted datasets. With this approach, 3 out of 6 participants showed significant d’-phase coupling which corresponds to a binomial probability of p = 0.002, indicating a very low probability that we observed these results by chance alone.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024). To assess significance at the participant level, we averaged the participant’s resultant vector length and permutations from -450 to 0 ms and obtained the 95th percentile of the time-averaged permutations. We then compared the averaged resultant vector lengths to the permutation thresholds for each subject, which revealed 3 out of 6 significant subjects. We then used the MATLAB function myBinomTest.m (Nelson, 2026) to compute the p-value associated with the probability of having observed 3 out of 6 significant subjects by chance (with a false-positive rate of 0.05).”

      To address the reviewer's second request, we now include a supplemental figure which has each individual’s results for the main analysis (see Supplementary figure 3). These graphs, in addition to the methods, now provides the reader with each participant’s given set of analysis electrodes.

      “Each participant had a different combination of electrodes which were used in the analyses; however, the same three channels were used across sessions within a participant (participant 1: POz, PO3, O1; participant 2: P7, PO7, PO4; participant 3: P2, P1, Pz; participant 4: O1, Oz, O2; participant 5: O2, PO8, PO4; participant 6: Oz, O2, O1).”

      As an alternative approach, linear mixed models (LMM) could be used for statistics, as they are more suitable for small sample sizes (Wiley et al., 2019). LMM improve generalization by modelling subject-specific random effects. Although raw circular data is not suitable for LMM, the sine and cosine of the phases could have been used as predictors, for instance. Given that data were collected for 6 different sessions, sessions could be included as a factor in the model to improve statistical power.

      We appreciate the suggestion but feel that LMMs would be a challenge in this case not only because the main predictor variables are circular, but because the main outcome variables are not defined on the single-trial level and require many trials to be computed (e.g., classification images, SDT measures, response consistency). As such, computing these measures within a session may also lead to noisier estimates than we had designed our experiment for. We therefore prefer the more straightforward approach we have taken in the paper, which has now been supplemented by a binomial test of individual-subject level significance.

      Given that the number of subjects is quite small, I believe that individual data should be presented (either in the main text or supplementary materials) also for figures: 2A, B, C and D.

      Thank you, we have included all of these results to the individual graphs in the Supplemental Materials (see Supplementary figure 3).

      In plot 2B (HR and FAR) a p-value = 0.015 appears. However, in the text you write:

      "Indeed, this showed that the difference between the HR and FAR vector angle was significantly clustered around a mean of 180{degree sign} (v = 3.78, p = 0.01), indicating that the phase angle associated with the greatest hits was counterphase to the phase angle associated with the greatest false alarms."

      Which one is correct? Or do they refer to different tests?

      We appreciate you catching this confusing discrepancy. The two values refer to the same test which has a p-value of 0.0145. In the figure, this value was rounded to the thousandths decimal place (i.e., 0.015), whereas in the text it was rounded to the hundredths value (0.01). We now consistently report p-values out to three decimal places throughout the manuscript.

      Did you perform any statistical test for phasic modulation of dprime and criterion? I say that because in Figure 2B, you state that the data shows a "clear phasic modulation of d' across bins", but no statistic is mentioned. On the other hand, in Figure 2D, you state, "We did not & observe any significant phase-dependent relationship between phase and criterion." Is this sentence referring to both 2C and 2D panels or only to 2C?

      Figure 2B and 2D show the phase-behavior relationship across bins after aligning the phase bins to each participant's “best” d’ bin. This bin is omitted from the plots because it is used for alignment, making the analysis circular. Accordingly, these panels were intended purely for visualization and were not used for statistical inference. Additional language has been added to the figure caption highlighting this aspect.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only.”

      The primary statistical test for phase-behavior coupling was performed using permutation testing of the resultant vector length, which quantifies the magnitude of phase-dependent modulation. These results are shown in Figures 2A (for d′) and 2C (for criterion). In the original manuscript, we reported only the time points that survived cluster-based correction, but did not explicitly report the cluster p-values. We have now added these cluster p-values to the manuscript for completeness.

      “The data revealed significant cluster-corrected coupling between alpha phase and d’ in the prestimulus window from -220 ms until stimulus onset (cluster p = 0.046),...”

      Additionally, we have changed the caption of Figure 2 to be separate for C) and D).

      “(C) No evidence for the coupling of criterion to pre-stimulus alpha-band phase. Graph C reveals the time course of the resultant vector lengths for alpha phase-criterion coupling, which shows no significant phase-dependent relationship between phase and criterion.

      (D) The underlying shifted c across phase bins (shifted to participants’ optimal phase, as in graph B) did not visually demonstrate a phasic modulation pattern.”

      Minor issues:

      In general, the paper is very clear. I found a statement confusing in the Response consistency section:

      "To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. This procedure was done for each channel at each time point (from -450 to 0 ms) and then averaged."

      Which makes no sense, as response consistency is independent of channel and time point. I believe here you refer to the phase, maybe by just changing the order (start with response consistency and then proceed to phase), the paragraph would be clearer.

      We appreciate you catching this mistake. We have clarified the Methods section in the following way:

      “To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. Since the optimal phase changes over time, the set of trials were classified as either both having occurred during the optimal phase (or otherwise) for each time point (from -450 to 0 ms) and channel. The proportion of consistent responses was then averaged across channels and time.”

      Could you include a plot of the power spectrum used for IAF estimation of all the subjects?

      Thank you for the suggestion. In Supplemental Figure 3 we have included the power spectrum that was used to estimate IAF in addition to a topoplot of alpha power (IAF +/- 2 Hz) that has the analysis electrodes labelled.

      Bibliography:

      Wiley RW, Rapp B. Statistical analysis in Small-N Designs: using linear mixed-effects modeling for evaluating intervention effectiveness. Aphasiology. 2019;33(1):1-30. doi: 10.1080/02687038.2018.1454884.

      Zoefel B, Davis MH, Valente G, Riecke L, How to test for phasic modulation of neural and behavioural responses, NeuroImage, Volume 202, 2019,116175, https://doi.org/10.1016/j.neuroimage.2019.116175.

    1. eLife Assessment

      This is an important study that addresses the role of fever as a conserved response to viral infection. It demonstrates that the heat-shock factor, HSF1, is activated by increased temperature during fever to enhance the anti-viral immune response. The data provides compelling evidence for the conclusions and the work will be of interest to virologists, immunologists, and cell biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods," Xiao and colleagues examine the role of the shrimp Litopenaeus vannamei HSF1 ortholog (LvHSF1) in the response to viral infection. The authors provide compelling support for their conclusions that the activation of LvHSF1 limits viral load at high temperatures. Specifically, the authors convincingly show that (i) LvHSF1 mRNA and protein are induced in response to viral infection at high temperatures, (ii) increased LvHSF1 levels can directly induce the expression of the nSWD (and directly or indirectly other antibacterial peptides, AMPs), (ii) nSWD's antimicrobial activities can limit viral load, and, (iv) LvHSF1 protects survival at high temperatures following virus infection. These data thus provide a model by which an increase in HSF1 levels limits viral load through the transcription of antimicrobial peptides, and provide a rationale for the febrile response as a conserved response to viral infection.

      Strengths:

      The large body of careful time series experiments, tissue profiling, and validation of RNA-seq data is convincing. Several experimental methodologies are used to support the author's conclusions that nSWD is an LvHSf1 target and increased LvHSF1 alone can explain increased levels of nSWD. Similar carefully conducted experiments also conclusively implicate nSWD protein in limiting WSSV viral loads.

      Weaknesses:

      As with any complex biological phenomenon, several aspects remain incompletely explained. Nevertheless, in their revision, the authors provide additional analyses supporting the authors model that losing LvHSF1 is not detrimental to survival, by more directly altering viral loads. In addition, their revised manuscript clarifies the complex interactions between infection, the role of HSF1, and hormesis. These revisions increase the impact of their findings.

      Comments on revisions:

      The authors have addressed all comments, and the manuscript is very much improved.

    3. Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However the logical flow of the paper can be improved.

      Comments on revisions:

      Some aspects of the initial study design, regarding the selection of representative candidate genes and the logical flow, raised concerns. However, these issues have been addressed in the revised manuscript through additional validations and clarifications. Most of my comments and concerns were sufficiently addressed in the revised manuscript. The results support the authors' conclusion that HSF1-dependent regulation of AMP expression contributes to antiviral defense under febrile conditions.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Despite this compelling data regarding the protective role of HSF1 in the febrile response, what remains unexplained and complicates the authors' model is the observation that losing LvHSF1 at 'normal' temperatures of 25 ℃ is not detrimental to survival, even though viral loads increase and nSWD is likely still subject to LvHSF1 regulation. These observations suggest that WSSV infection may have other detrimental effects on the cell not reflected by viral load and that LvHSF1 may play additional roles in protecting the organism from these effects of WSSV infection, such as perhaps, perturbations to protein homeostasis. This is worth discussing, especially in light of the rather complicated roles of hormesis in protection from infection, the role of HSF1 in hormesis responses, and the findings from other groups that the authors discuss.

      We are grateful for your unbiased advice by reviewer. And we have added the description about the role of HSF1 in hormesis responses in discussion in Lines 422-425 in the revised manuscript. Thank you.

      Reviewer #2 (Public review):

      Temperature is a critical factor affecting the progression of viral diseases in vertebrates and invertebrates. In the current study, the authors investigate mechanisms by which high temperatures promote anti-viral resistance in shrimp. They show that high temperatures induce HSF1 expression, which in turn upregulates AMPs. The AMPs target viral envelope proteins and inhibit viral infection/replication. The authors confirm this process in drosophila and suggest that there may be a conserved mechanism of high-temperature mediated anti-viral response in arthropods. These findings will enhance our understanding of how high temperature improves resistance to viral infection in animals.

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be clarified and extended. Further investigation on how WSSV infection is affected by AMP would have strengthened the study.

      We are grateful for your unbiased advice by reviewer. We have provided additional experimental evidence and supplementary instructions in the revised manuscript. Thank you.

      Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved.

      We are grateful for the positive comments and the unbiased advice by reviewer. We have improved the logical flow of the paper and added corresponding instructions in the revised manuscript. Thank you.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: The analysis compares Group TW to Group W (not the other way around).

      Thank you very much. To uncover the molecular mechanisms by which high temperature restricts WSSV infection, two shrimp groups, Group TW and Group W, were cultured at 25 °C. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing (Figure 1A). RNA-seq was used to identify genes responsive to high temperature, particularly those encoding potential transcriptional regulators. Thank you.

      (2) The RNA-seq data in Figure 1 focus only on the TFs. The manuscript would benefit from showing all the RNA-seq data and the differentially expressed genes. In particular, are the AMPs upregulated at the same time point? This should not be the case if LvHSF1 were responsible for the transcription of the AMPs, given the time lag between transcription and translation.

      Thank you for your suggestion. In Author response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024).

      Additionally, we also analyzed the AMPs expression between Group TW and Group W, and the results show that some antimicrobial peptides such as Lysozyme and C-type lectin are upregulated between Group TW and Group W. Notably, we did not detect upregulated expression of SWD between Group TW and Group W. We agree with the reviewer's point of view that there is a time lag between transcription and translation. Supplementary experimental evidences show that the expression level of LvHSF1 is strongly induced by WSSV stimulation, and then the expression level of SWD begins to increase. We have added a description in Lines 136-138 in the revised manuscript.

      Author response image 1.

      The Figure of the heat shock proteins in Group TW and Group W

      Author response image 2.

      Transcriptional expression levels of HSF1 and SWD after WSSV stimulation

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (3) The data showing the tissue distribution of LvHSF1 and nSWD is a rigorous approach and adds to the manuscript. A similar approach to understanding the time course of expression of AMPs in relationship to LvHSF1 expression levels would strengthen the authors' conclusions that LvHSF1 induction in response to high temperatures and viral infection, in turn, upregulates SWD and other antibacterial genes.

      Thank you for your suggestion. As you good suggestion, we detected the transcriptional expression levels of HSF1 and SWD after WSSV stimulation for 0, 2, 4, 6, 8, 12, 16, 20, and 24 hours. The transcriptional expression level of SWD was set to 1.00 at 0 h, in the early stage of WSSV infection (0-12 h, except 6 h), the expression level of LvHSF1 is strongly induced, and then the expression level of SWD begins to increase. Theses results show that LvHSF1 induction in response to viral infection, in turn, upregulates SWD and other antibacterial genes. Thank you.

      (4) The data (Figures 3 and 4) show that LvHSF1 is necessary to survive WSSV infection at high temperatures but does not affect survival at lower temperatures, even though LvHSF1 limits VP28 levels, and viral load at both temperatures is confusing. Does this suggest that LvHSF1 is not primarily important for protection against the virus but instead, for protection from the heat-induced damage caused by high temperatures, which would not be surprising? The manuscript would benefit if the authors could address this point. How do the authors envision the protection conferred by LvHSF1 only at high temperatures?

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection.

      Notably, the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. When infected with WSSV, shrimp use behavioral fever to elevate their body temperature (~32 °C), thereby inhibiting WSSV infection (Rakhshaninejad et al., 2023; Xiao et al., 2024). And this temperature (~32 °C) will not cause heat-induced damage to the shrimp. Our results demonstrate that febrile temperatures induce HSF1, which in turn upregulates antimicrobial peptides (AMPs) that target viral envelope proteins and inhibit viral replication.

      Only at high temperatures, we observed that knockdown of HSF1 did not affect shrimp survival rate (Figure 4A). Thank you again for your valuable feedback.

      Reference:

      Rakhshaninejad, M., Zheng, L., Nauwynck, H., 2023. Shrimp (Penaeus vannamei) survive white spot syndrome virus infection by behavioral fever. Sci Rep 13, 18034.

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (5) Related to the previous comment, the authors do not clearly distinguish between basal effects of LvHSF1 or nSWD induction and heat-induced effects and the differences related to the requirement of LvHSF1 for protection. Simply increasing LvHSF1 levels can result in increased nSWD. SWD levels increase upon WSSV infection even at 25 ℃, and the knockdown experiments suggest that this could also occur through LvHSF1. It would be useful to explicitly differentiate between basal functions of HSF1 and induced functions.

      Thank you for your suggestion. In previous responses, we have distinguished between basal effects of LvHSF1 or nSWD induction and heat-induced effects.

      As your good suggestion, we injected GST or rHSF1 protein into shrimp, the results showed that recombinant protein HSF1 could significantly induced the expression level of SWD (Supplementary Fig. 5C). Further, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 253-255 and Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Two temperatures are used in the experiments of shrimp. It seems that HSF1 is also upregulated by WSSV infection at 25 ℃. However, this upregulation seems not to be able to protect the animals. The authors compare the infection at 25 and 32 ℃ but did not discuss the findings.

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection. We have added a discussion of this finding in Lines 461-464 in the revised manuscript. Thank you.

      (2) In the abstract the authors say that "These insights provide new avenues for managing viral infections in aquaculture and other settings by leveraging environmental temperature control." However, this point has not been discussed in the main text.

      We appreciated your comments. We have added a discussion about the environmental temperature control in Lines 512-514 in the revised manuscript. Thank you.

      (3) Line 142: "These results suggest that LvHSF1 may play a key role in enhancing shrimp resistance to WSSV at elevated temperatures." Although this type of conclusion has been made in many studies, I think it is impossible to see a "KEY role" based mainly on change in expression.

      Thank you for your suggestion. We have revised this conclusion in the revised manuscript. Thank you.

      (4) Section 2.1 Induction of Heat Shock Factor 1 in Response to WSSV at High Temperature

      Figure 1. Identification of HSF1 as a key factor induced by high temperature.

      The two titles are confusing. Whether the upregulation of HSF1 is a response to high temperature or WSSV infection? I think it is more likely a response to high temperature. Did the authors see the difference in HSF1 expression in shrimp with and without WSSV infection at high temperatures?

      Thank you for your comment. We have modified the title of Section 2.1 in the revised manuscript. As your good suggestion, we have measured the expression of LvHSF1 after WSSV challenge at high temperatures (32 ℃) in revised Figure 2F-2H in Line 122 in the revised manuscript. The results demonstrate that the expression of LvHSF1 is strongly induced by WSSV stimulation at high temperatures (32 ℃) in the revised manuscript. Thank you.

      (5) Figure 2. Upregulation of LvHSF1 in shrimp challenged by WSSV at both low and high temperatures. Results for WSSV challenge at high temperatures are not included in this figure.

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. The results demonstrate that the expression of LvHSF1 is strongly induced by Poly (I: C) and WSSV stimulation at high temperatures (32 ℃). And we have added a description in Lines 168-179 in revised manuscript. Thank you.

      (6) Section 2.2 Expression Profiles of LvHSF1 in Shrimp Under Varied Temperature Conditions and WSSV Challenge. Did the authors try poly IC and WSSV challenge at 32℃, and compare with the un-challenge group? Why were only low temperature was analyzed?

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. And we have added a description about the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in Lines 168-179 in revised manuscript. Thank you.

      (7) Figure 2: Please indicate the temperature used in C-E and F-H in the figure legend. Statistical significance: compared with which group? Please provide information in the legend or show it in the bar chart.

      Thank you for your suggestion. We have added the description of temperature used in revised Figures 2C-2E. The expression changes of HSF1 were compared with those of PBS control group at the corresponding time and we modified the comparison method of significance in revised Figures 2C-2E. Thank you.

      (8) Figure 3H: There are two groups (dsGFP+PBS; dsHSF1+PBS) showing with the same symbol (dot line).

      Thank you for your comment. The revised Figure 3H has used different symbols to distinguish the two groups. Thank you.

      (9) Line 205: qPCR

      Thank you for your careful checks. We have corrected this error in the revised manuscript. Thank you.

      (10) Figure 5d and f: Please indicate the sample in each row.

      Thank you for your suggestion. We have marked the samples in each row in the revised Figures 5d&5f.

      (11) Figure 3 and Figure 4: Why different tissues were analyzed in the two experiments? Low temperature: gill and hemocytes. High temperature: gill and muscle? It is better to use the same tissues so that they can be compared. Please indicate the tissue analyzed in D and d.

      Thank you for your suggestion. We have repeated the experiment to detect the copy number of WSSV in hemocyte at high temperature (32 °C) after LvHSF1 knockdown. The results showed that knockdown LvHSF1 showed increased viral loads in shrimp hemocyte (Figure 4C). We have supplemented the tissue information in Figure 4D&4d. Thank you.

      (12) Figure 2A The time for temperature treatment? hours or days?

      Thank you for your comment. Transcriptional expression of LvHSF1 in different tissues of healthy shrimp subjected to low (25 °C) and high (32 °C) temperatures for 12 hours. We have supplemented this information in the legend of Figure 2A in Lines 840-841 in revised manuscript. Thank you.

      (13) Line 249: purified by SDS-PAGE gel?

      Thank you for your comment. We have modified this description in Lines 272-274 in current manuscript. Thank you.

      (14) Line 258 "Next, to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperature". I think it is confusing to use "mediated" here. It seems that HSF1 is downstream of nSWD. Actually, HSF1 controls the expression of nSWD and thus regulates the anti-WSSV effect of shrimp at high temperatures.

      We appreciated your comments. We have modified this description in Lines 282-283 in current manuscript. Thank you.

      (15) Line 458 "The most probable anti-WSSV mechanism of nSWD is its direct interaction with WSSV envelope proteins VP24 and VP26, potentially inhibiting viral entry into target cells. I suggest the author analyze the entry of WSSV to see whether nSWD blocks this process.

      Thank you for your comment. In general, the antimicrobial mechanism of action of AMPs is thought to involve direct membrane disruption, especially for enveloped virus (such as WSSV) (Wilson et al., 2013).

      Thanks to the reviewers for their valuable comments. Our manuscript mainly focuses on the febrile temperature-inducible HSF in host antiviral immunity, and the role of HSF1 in regulating antimicrobial effectors (such as SWD). Due to the limitation of the manuscript's length, we will further investigate the functional mechanisms of SWD-specific anti-WSSV in future studies. Thank you.

      Reference:

      Wilson, S.S., Wiens, M.E., Smith, J.G., 2013. Antiviral Mechanisms of Human Defensins. Journal of Molecular Biology 425, 4965-4980.

      (16) Line 435-456 The author discusses the difference between two shrimp species. Did the two studies measure the same immune parameters? I wonder whether the different observation is due to true differences or different methods they used to evaluate the response. If no immune response was promoted in the previous study, what's the possible anti-viral mechanism?

      We appreciated your comments. Firstly, the shrimps in the two experimental groups have different adaptability to temperature. The optimal water temperature for M. japonicus growth ranges from 25 to 32 °C, and the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. Secondly, the experimental environmental factors are different in the two experimental groups. Ammonia is a key stress factor in aquatic environments that usually increases the risk of pathogenic diseases in aquatic animals, however, High temperatures (32°C) have been shown to inhibit the replication of WSSV and reduce mortality in WSSV-infected shrimp. Thirdly, the two studies tested different immune indicators. Ammonia-induced Hsf1 suppressed the production and function of MjVago-L, an arthropod interferon analog. In this study, our findings revealed the molecular mechanism through which the HSF-AMPs axis mediates host resistance to viruses induced by febrile temperature. Taken together, the benefits of HSF1 can be attributed to either the host or the pathogen, depending on the nature and context of the host-virus-environment interaction.

      (17) Line 472 "directly bind to WSSV envelope proteins and inhibit WSSV proliferation"

      I think it is confusing to use "proliferation" here. It seems that the binding of HSF affects the replication process. However, based on the authors' discussion, HSF may likely block viral entry.

      Thank you for your suggestion. We have modified this description in Lines 505-507 in the current manuscript. Thank you.

      Reviewer #3 (Recommendations for the authors):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved. Following are my specific concerns.

      Major comments

      (1) The study design is pretty good, but the logical flow is not. The following should be improved.

      (a) In Figure 1, the reason for selecting HSF1 as the focus of the study is not clearly explained.

      Thank you for your comment. In a previous study, we have revealed that heat shock proteins exerted a significant role in enhancing the resistance of shrimp to WSSV at elevated temperature (32 ℃) (Xiao et al., 2024). GO functional enrichment analysis of DEGs between group TW and group W, indicating that most DEGs were involved in biological processes such as protein refolding, chaperone-mediated protein folding, and heat response. Therefore, special attention has been paid to heat shock factor 1 (HSF1), the master regulator of the heat shock response. We have added the description in Lines 136-138 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (b) As the authors draw models in Figure 9, the established activation mechanism of HSF1 is via trimerization by the release of HSP90, which binds to misfolded proteins under stress conditions, such as heat shock. Therefore, the increase in the HSF1 mRNA level in Figure 1 is strange. The authors need to clarify this issue by explaining this established activation mechanism of HSF1 and also must provide the basis of upregulation of HSF1 by mRNA increase via citing papers in the Introduction.

      We appreciated your comments. Under non-stress conditions, HSF monomers are retained in the cytoplasm in a complex with HSP90. During the stress response, such as high temperature, HSF dissociates from the complex, trimerizes, and converts into a DNA-binding conformation through regulatory upstream promoter elements known as heat shock elements (HSEs) (Andrasi et al., 2021). Previous studies have demonstrated that the expression of HSF1 was remarkably induced by stress response, such as high temperature (Ren et al., 2025), virus infection (Merkling et al., 2015), and ammonia stress (Wang et al., 2024). Our results also showed that the expression of LvHSF1 was significant induced by WSSV infection and high temperature (Figure 2). Therefore, this is not surprising that the increase in the HSF1 mRNA level in Figure 1.

      In response, we have revised the proposed model to better reflect our experimental findings and the accompanying description. This revision ensures that the schematic is consistent with our data and accurately represents the proposed mechanism. We appreciate your careful review and constructive feedback.

      Reference:

      Andrasi, N., Pettko-Szandtner, A., Szabados, L., 2021. Diversity of plant heat shock factors: regulation, interactions, and functions. J Exp Bot 72, 1558-1575.

      Ren, Q., Li, L., Liu, L., Li, J., Shi, C., Sun, Y., Yao, X., Hou, Z., Xiang, S., 2025. The molecular mechanism of temperature-dependent phase separation of heat shock factor 1. Nature Chemical Biology.

      Merkling, S.H., Overheul, G.J., van Mierlo, J.T., Arends, D., Gilissen, C., van Rij, R.P., 2015. The heat shock response restricts virus infection in Drosophila. Sci Rep 5, 12758.

      Wang, X.X., Zhang, H., Gao, J., Wang, X.W., 2024. Ammonia stress-induced heat shock factor 1 enhances white spot syndrome virus infection by targeting the interferon-like system in shrimp. mBio 15, e0313623.

      (c) For RNA seq analysis in both in Figures 1 and 5, they need to provide changes in conventional HSF1 target chaperones (many HSPs) to validate their RNA seq data.

      Thank you for your suggestion. In Authopr response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024). We have added the description in Lines 136-138 in the revised manuscript.

      In Figure 5, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (d) In Figure 5, they did experiments by focusing on the changes by HSF1 knockdown at 32 ℃. However, the logical flow should be focusing on genes whose expression was increased by 32 ℃ compared with 25 ℃ (in figure 1), among them they need to characterize HSF1 target genes. Here as mentioned above, classical HSP genes must be included in addition to those AMP genes.

      Thank you for your suggestion. As your good suggestion, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      (e) What is the logical basis of just picking nSWD? It is another example of cherry-picking similar to picking HSF1 in Figure 1.

      We appreciated your comments. To determine how temperature-induced LvHSF1 restricts WSSV infection, RNA-seq was performed to identify target genes regulated by HSF1. By analyzing the differentially expressed genes (DEGs), we screened eight candidate proteins for immunity-effector molecules, including SWD, CrustinⅠ, C-type lectin, Anti-lipopolysaccharide factor (ALF), and Vago. CrustinⅠ has been shown to play an important role in antiviral immunity (Li et al., 2020); C-type lectin (CTL1) can bind to the VP28, VP26, VP24, VP19, and VP14, thereby inhibiting the infection of WSSV (Zhao et al., 2009); Anti-lipopolysaccharide factor (ALF3) performs its anti-WSSV activity by binding to the envelope protein WSSV189 (Methatham et al., 2017); Vago can inhibit WSSV infection by activating the Jak/Stat pathway in shrimp (Gao et al., 2021). However, the detailed regulatory mechanism of SWD against WSSV was unclear, and particular attention was paid to the SWD. We have added the description in Lines 215-220 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Li, S., Lv, X., Yu, Y., Zhang, X., Li, F., 2020. Molecular and Functional Diversity of Crustin-Like Genes in the Shrimp Litopenaeus vannamei, Marine Drugs 18, 361.

      Zhao, Z.Y., Yin, Z.X., Xu, X.P., Weng, S.P., Rao, X.Y., Dai, Z.X., Luo, Y.W., Yang, G., Li, Z.S., Guan, H.J., Li, S.D., Chan, S.M., Yu, X.Q., He, J.G., 2009. A novel C-type lectin from the shrimp Litopenaeus vannamei possesses anti-white spot syndrome virus activity. Journal of Virology 83, 347-356.

      Methatham, T., Boonchuen, P., Jaree, P., Tassanakajon, A., Somboonwiwat, K., 2017. Antiviral action of the antimicrobial peptide ALFPm3 from Penaeus monodon against white spot syndrome virus. Dev Comp Immunol 69, 23-32.

      Gao, J., Zhao, B.R., Zhang, H., You, Y.L., Li, F., Wang, X.W., 2021. Interferon functional analog activates antiviral Jak/Stat signaling through integrin in an arthropod. Cell Rep 36, 109761.

      (f) Likewise, choosing Atta in S2 cells needs logic.

      We appreciated your comments. Our manuscript revealed that febrile temperature inducible HSF1 confers virus resistance by regulating the expression of antimicrobial peptides (AMPs) in L. vannamei. Further, we want to know that whether HSF1 regulation of antimicrobial peptides is a conserved defense mechanism induced by elevated temperature in arthropods, and experiments were performed in an invertebrate model system (Drosophila S2 cells). Previous study showed that DmAMPs (such as Attacin A, Cecropins A, Defensin, Metchnikowin, and Drosomycin) exerted a significant role in the antiviral immunity in Drosophila (Zhu et al., 2013). Our results showed that the expression of Attacin A, Cecropins A and Defensin were remarkably induced by DmHSF, and the expression of Attacin A was the highest induced. Therefore, DmAtta was chosen as a representative to further demonstrate that DmHSF1 exerts its anti-DCV function by regulating DmAMPs. We have added the description in Lines 328-330 and Lines 361-364 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Zhu, F., Ding, H., Zhu, B., 2013. Transcriptional profiling of Drosophila S2 cells in early response to Drosophila C virus. Virol J 10, 210.

      (2) From Figure 6I to 6K, the authors aimed to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperatures. However, what they showed was just showing that nSWD plays anti-WSSV function downstream of HSF1. The authors should show additional data for dsControl+rnSWD.

      Thank you for your suggestion. As your suggestion, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      (3) For the physical interaction between nSWD and WSSV, it will be great if the authors perform Alphafold3 prediction analysis (Abramson et al PMID: 38718835).

      Thank you for your suggestion. As you suggestion, we performed Alphafold3 prediction analysis on SWD and WSSV (VP24 and VP26). The predicted template modeling (pTM) score measures the accuracy of the entire structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. The Alphafold3 prediction results show that there is a possible interaction between SWD and WSSV. Notably, our manuscript demonstrated that rSWD could interact with VP24 and VP26 by pulldown assays and confocal analysis.

      Author response image 3.

      Alphafold3 prediction analysis of SWD&VP24 as follow (pTM = 0.64)

      Author response image 4.

      Alphafold3 prediction analysis of SWD&VP26 as follow (pTM = 0.53)

      Minor comments

      (1) In the Abstract and many other places, the authors need to specifically write "Drosophila S2 cells" instead of "Drosophila" because conventionally Drosophila implies fruit fly as an organism. We don't say cultured human cells as "human" or "Homo sapiens" in papers.

      Thank you for your suggestion. We have modified the description of Drosophila in the revised manuscript. Thank you.

      (2) Figure numbers can be reduced for better readability. I would combine Figures 1 and 2, and Figures 3 and 4. If the combined figures are too crowded, some can go to into supplementary figures.

      Thank you for your suggestion. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. However, we have added some experimental data to Figures 1, 2, 3, and 4. Therefore, we did not combine Figure 1 and Figure 2, and Figures 3 and 4. Thank you.

      (3) One of the best-understood roles of HSF1 in physiology other than heat shock response is longevity, in particular with C. elegans. The authors need to mention this in the Discussion by citing the following recent review paper (Lee PMID: 36380728).

      Thank you for your suggestion. We have supplemented the description of HSF1 regulating longevity and aging of organisms and cited the above reference in the revised manuscript (Lee and Lee, 2022). Thank you.

      Reference:

      Lee, H., Lee, S.V., 2022. Recent Progress in Regulation of Aging by Insulin/IGF-1 Signaling in Caenorhabditis elegans. Mol Cells 45, 763-770.

      (4) Please make your own label for small letter panels or transfer small letter panels to supplementary figures.

      Thank you for your suggestion. We have adjusted the relevant letter labels. The uppercase letters represent the main image of the Figure, and the small letter panels are the corresponding supplementary instructions in the revised manuscript. Thank you.

      (5) In the introduction part, I recommend changing the references for HSFs and HSR with recent ones.

      Thank you for your suggestion. We have added the latest references for HSFs and HSR in the Introduction part of the revised manuscript. Thank you.

      (6) In Figure 1, it is not intuitive to understand the name groups W and TW.

      We appreciated your comments. We have added the description of Group W and Group TW in revised Figure 1. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing. Thank you.

      (7) Please add some kinds of sequence comparisons of SWD and nSWD for readers to understand the homology.

      We appreciated your comments. We have added the multiple sequence alignment of SWD proteins in shrimp species in revised Supplementary Figure 3. Highly conserved amino acid residues and cysteine and residues are highlighted in red, indicating that LvSWD is a conserved antimicrobial peptide of the Crustin family. Thank you.

      (8) Naming nSWD with "newly identified" is strange as it will not be new anymore as time goes by. Please change the name.

      Thank you for your suggestion. We have modified the name of nSWD to SWD in the revised manuscript. Thank you.

      (9) Please write the full name for Lv (Litopenaeus vannamei), Dm (Drosophila melanogaster), ds (double-stranded) before using LvHSF1, DmHSF1, and dsLvHSF1.

      Thank you for your comments. We have added the full name of LvHSF1, DmHSF1, and dsLvHSF1 in the revised manuscript. Thank you.

      (10) In Figure 2, it will be better to transfer poly I:C data to supplementary figures.

      Thank you for your comments. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. Thank you.

      (11) The label for pGL3-nSWD-M12 is confusing. M1 and M2 are OK. Please change M12 with M1/2 or another one.

      Thank you for your suggestion. We have changed pGL3-nSWD-M12 with pGL3-nSWD-M1/2 in the revised manuscript. Thank you.

    1. eLife Assessment

      This article presents valuable findings on how the timing of cooling affects autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect bud set timing. The statistical analysis is very well considered, while indicating some factors that may temper the authors' claims. The factorial experiments offer solid support.