10,000 Matching Annotations
  1. Oct 2024
    1. eLife Assessment

      This valuable study describes how a single effector of the Type Six Secretion System (T6SS) has two distinct enzymatic functions that together may contribute to bacterial survival and dynamics in a community and provide potential for developing new antimicrobial compounds. The authors have deployed a range of methods in biochemistry, microbiology, and microscopy, generating solid data that support the main assertions. While the manuscript could benefit from additional clarifying experiments and a more detailed discussion of the methods, it will appeal to those studying T6SS, particularly those interested in effectors and bacterial enzymes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript performs a comprehensive biochemical, structural, and bioinformatic analysis of TseP, a type 6 secretion system effector from Aeromonas dhakensis that includes the identification of a domain required for secretion and residues conferring target organism specificity. Through targeted mutations, they have expanded the target range of a T6SS effector to include a gram-positive species, which is not typically susceptible to T6SS attack.

      Strengths:

      All of the experiments presented in the study are well-motivated and the conclusions are generally sound.

      Weaknesses:

      There are some issues with the clarity of figures. For example, the microscopy figures could have been more clearly presented as cell counts/quantification rather than representative images. Similarly, loading controls for the secreted proteins for the westerns probably should be shown.

      Also, some of the minor/secondary conclusions reached regarding the "independence" of the N and C term domains of the TseP are a bit overreaching.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. investigate the role of TseP, a Type VI secretion system (T6SS) effector molecule, revealing its dual enzymatic activities as both an amidase and a lysozyme. This discovery significantly enhances the understanding of T6SS effectors, which are known for their roles in interbacterial competition and survival in polymicrobial environments. TseP's dual function is proposed to play a crucial role in bacterial survival strategies, particularly in hostile environments where competition between bacterial species is prevalent.

      Strengths:

      (1) The dual enzymatic function of TseP is a significant contribution, expanding the understanding of T6SS effectors.

      (2) The study provides important insights into bacterial survival strategies, particularly in interbacterial competition.

      (3) The findings have implications for antimicrobial research and understanding bacterial interactions in complex environments.

      Weaknesses:

      (1) The manuscript assumes familiarity with previous work, making it difficult to follow. Mutants and strains need clearer definitions and references.

      (2) Figures lack proper controls, quantification, and clarity in some areas, notably in Figures 1A and 1C.

      (3) The Materials and Methods section is poorly organized, hindering reproducibility. Biophysical validation of Zn²⁺ interaction and structural integrity of proteins need to be addressed.

      (4) Discrepancies in protein degradation patterns and activities across different figures raise concerns about data reliability.

    4. Reviewer #3 (Public review):

      Summary:

      Type VI secretion systems (T6SS) are employed by bacteria to inject competitor cells with numerous effector proteins. These effectors can kill injected cells via an array of enzymatic activities. A common class of T6SS effector are peptidoglycan (PG) lysing enzymes. In this manuscript, the authors characterize a PG-lysing effector-TseP-from the pathogen Aeromonas dhakensis. While the C-terminal domain of TseP was known to have lysozyme activity, the N-terminal domain was uncharacterized. Here, the authors functionally characterize TsePN as a zinc-dependent amidase. This discovery is somewhat novel because it is rare for PG-lysing effectors to have amidase and lysozyme activity.

      In the second half of the manuscript, the authors utilize a crystal structure of the lysozyme TsePC domain to inform the engineering of this domain to lyse gram-positive peptidoglycan.

      Strengths:

      The two halves of the manuscript considered together provide a nice characterization of a unique T6SS effector and reveal potentially general principles for lysozyme engineering.

      Weaknesses:

      The advantage of fusing amidase and lysozyme domains in a single effector is not discussed but would appear to be a pertinent question. Labeling of the figures could be improved to help readers understand the data.

    1. eLife Assessment

      This work provides a potentially valuable framework for understanding the primary causes of disease. However, the evidence supporting the utility of the approach is incomplete given the reliance on strong assumptions about the underlying causal mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript seeks to estimate the causal effect of genes on disease. To do so, they introduce a novel algorithm, termed the Root Causal Strength using Perturbations (RCSP) algorithm. RCSP uses perturb-seq to first estimate the gene regulatory network structure among genes, and then uses bulk RNA-seq with phenotype data on the samples to estimate causal effects of genes on the phenotype conditional on the learned network structure. The authors assess the performance of RCSP in comparison to other methods via simulation. Next, they apply RCSP to two real human datasets: 513 individuals age-related macular degeneration and 137 individuals with multiple sclerosis.

      Strengths:

      The authors tackle an important and ambitious problem - the identification of causal contributors to disease in the context of a causal inference framework. As the authors point out, observational RNA-seq data is insufficient for this kind of causal discovery, since it is very challenging to recover the true underlying graph from observational data; interventional data are needed. However, little perturb-seq data has been generated with annotated phenotype data, and much bulk RNA-seq data has already been generated, so it is useful to propose an algorithm to integrate the two as the authors have done.

      The authors also offer substantial theoretical exposition for their work, bringing to bear both the literature on causal discovery as well as literature on the genetic architecture of complex traits.

      Weaknesses:

      The notion of a "root" causal gene - which the authors define based on a graph theoretic notion of topologically sorting graphs - requires a graph that is directed and acyclic. It is the latter that constitutes an important weakness here - it simply is a large simplification of human biology to draw out a DAG including hundreds of genes and a phenotype Y and to claim that the true graph contains no cycles. This is briefly touched upon the discussion, but given the fundamental nature of this choice - the manuscript should devote at least some of the main results to exploring the consequence of mischaracterizing true cyclic graphs as DAGs in this framework. For example - consider the authors' analysis of T cell infiltration in multiple sclerosis (MS). CD4+ effector T cells have the interesting property that they are stimulated by IL2 as a growth factor; yet IL2 also stimulates the activation of (suppressive) regulatory T cells. What does it mean to analyze CD4+ regulation in disease with a graph that does not consider IL2 (or other cytokine) mediated feedback loops/cycles?

      I also encourage the authors to consider more carefully when graph structure learned from perturb-seq can be ported over to bulk RNA-seq. Consider again the MS CD4+ example - the authors first start with a large perturb-seq experiment (Replogle et al., 2022) performed in K562 cells. To what extent are K562 cells, which are derived from a leukemia cell line, suitable for learning the regulatory structure of CD4+ cells from individuals with an MS diagnosis? Presumably this structure is not exactly correct - to what extent is the RCSP algorithm sensitive to false edges in this graph? This leap - from cell line to primary human cells - is also not modeled in the simulation. Although challenging - it would be ideal for the RCSP to model or reflect the challenges in correctly identifying the regulatory structure.

      It should also be noted that in most perturb-seq experiments, the entire genome is not perturbed, and frequently important TFs (that presumably are very far "upstream" and thus candidate "root" causal genes) are not expressed highly enough to be detected with scRNA-seq. In that context - perhaps slightly modifying the language regarding RCSP's capabilities might be helpful for the manuscript - perhaps it would be better to describe it has an algorithm for causal discovery among a set of genes that were perturbed and measured, rather than a truly complete search for causal factors. Perhaps more broadly - it would also benefit the manuscript to devote slightly more text to describing the kinds of scenarios where RCSP (and similar ideas) would be most appropriately applied - perhaps a well-powered, phenotype annotated perturb-seq dataset performed in a disease relevant primary cell.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a very interesting use of a causal graph framework to identify the "root genes" of a disease phenotype. Root genes are the genes that cause a cascade of events that ultimately leads to the disease phenotype, assuming the disease progression is linear.

      Strengths:

      - The methodology has a solid theoretical background.<br /> - This is a novel use of the causal graph framework to infer root causes in a graph

      Weaknesses:

      (1) General Comments<br /> First, I have some general comments. I would argue that the main premise of the study might be inaccurate or incomplete. There are three major attributes of real biological systems, which are not considered in this work.

      One is that the process from health-to-disease is not linear most of the time with many checks along the way that aim to prevent the disease phenotype. This leads to a non-deterministic nature of the path from health-to-disease. In other words, with the same root gene perturbations, and depending on other factors outside of gene expression, someone may develop a phenotype in a year, another in 10 years and someone else never. Claiming that this information is included in the error terms might not be sufficient to address this issue. The authors should discuss this limitation.

      Two, the paper assumes that the network connectivity will remain the same after perturbation. This is not always true due to backup mechanisms in the cells. For example, suppose that a cell wants to create product P and it can do it through two alternative paths:<br /> Path #1: A -> B -> P Path #2: A -> C -> P<br /> Now suppose that path #1 is more efficient, so when B can be produced, path #2 is inactive. Once the perturbation blocks element B from being produced, the graph connectivity changes by activation of path #2. I did not see the authors taking this into consideration, which seems to be a major limitation in using perturb-seq results to infer connectivities.

      Three, there is substantial system heterogeneity that may cause the same phenotype. This goes beyond the authors claim that although the initial gene causes of a disease may differ from person to person, at some point they will all converge to changes in the same set of "root genes". This is not true for many diseases, which are defined based on symptoms and lab tests at the patient level. You may have two completely different molecular pathologies that lead to the development of the same symptoms and test results. Breast cancer with its subtypes is a prime example of that. In theory, this issue could be addressed if there is infinite sample size. However, this assumption is largely violated in all existing biological datasets.

      All the above limit the usefulness of this method for most chronic diseases, although it might still lead to interesting discoveries in cancer (in which the association between genes' dysregulation and development of cancer is more direct and occurs in less amount of time).

      With these in mind, the theoretical and algorithmic advances this paper offers are interesting. And the theoretical proofs are solid.

      (2) Specific comments.<br /> I am curious how the simulated data were generated and processed. Specifically, were the values of the synthetic variables Z-scored? If not, then I would expect that the variance of every variable will increase from the roots of the graph to the leaves. That will give an advantage in any algorithm aiming to identify causal relations based on error terms. For fairness and completeness, the authors should Z-score the values in the synthetic data and compare the results.

      The algorithm seems to require both RNA-seq and Perturb-seq data (Algorithm 1, page 14). Can it function with RNA-seq data only? What will be different in this case?

      (3) Additional comments:<br /> Although the manuscript is generally written clearly, some parts are not clear and others have missing details that make the narrative difficult to follow up. Some specific examples:<br /> - Synthetic data generation: how many different graphs (SEMs) did they start from? (30?) How many samples per graph? Did they test different sample sizes?<br /> - The presentation of comparative results (Suppl fig 4 and 7) is not clear. No details are given on how these results were generated. (what does it mean "The first column denotes the standard deviation of the outputs for each algorithm"?) Why all other methods have higher SD differences than RCSP? Is it a matter of scaling? Shouldn't they have at least some values near zero since the authors "added the minimum value so that all histograms begin at zero"? also, why RCSP results are more like a negative binomial distribution and every other is kind of normal?<br /> - What is the significance of genes changing expression "from left to right" in a UMAP plot? (eg Fig. 3h and 3g)

      The authors somewhat overstate the novelty of their algorithm. Representation of GRNs as causal graphs dates back in 2000 with the work of Nir Friedman in yeast. Other methods were developed more recently that look on regulatory network changes at the single sample level which the authors do not seem to be aware (e.g., Ellington et al, NeurIPS 2023 workshop GenBio and Bushur et al, 2019, Bioinformatics are two such examples). The methods they mention are for single cell data and they are not designed to connect single sample-level changes to a person's phenotype. The RCS method needs to be put in the right background context in order to bring up what is really novel about it.

    4. Reviewer #3 (Public review):

      Summary:

      The authors provide an interesting and novel approach, RCSP, to determining what they call the "root causal genes" for a disease, i.e. the most upstream, initial causes of disease. RCSP leverages perturbation (e.g. Perturb-seq) and observational RNA-seq data, the latter from patients. They show using both theory and simulations that if their assumptions hold then the method performs remarkably well, compared to both simple and available state-of-the-art baselines. Whether the required assumptions hold for real diseases is questionable. They show superficially reasonable results on AMD and MS.

      Strengths:

      The idea of integrating perturbation and observational RNA-seq dataset to better understand the causal basis of disease is powerful and timely. We are just beginning to see genome-wide perturbation assay, albeit in limited cell-types currently. For many diseases, research cohorts have at least bulk observational RNA-seq from a/the disease-relevant tissue(s). Given this, RCSP's strategy of learning the required causal structure from perturbations and applying this knowledge in the observational context is pragmatic and will likely become widely applicable as Perturb-seq data in more cell-types/contexts becomes available.

      The causal inference reasoning is another strength. A more obvious approach would be to attempt to learn the causal network structure from the perturbation data and leverage this in the observational data. However, structure learning in high-dimensions is notoriously difficult, despite recent innovations such as differentiable approaches. The authors notice that to estimate the root causal effect for a gene X, one only needs access to a (superset of) the causal ancestors of X: much easier relationships to detect than the full network.

      The applications are also reasonably well chosen, being some of the few cases where genome-scale perturb-seq is available in a roughly appropriate (see below) cell-type, and observational RNA-seq is available at a reasonable sample size.

      Weaknesses:

      Several assumptions of the method are problematic. The most concerning is that the observational expression changes are all causally upstream of disease. There is work using Mendelian randomization (MR) showing that the _opposite_ is more likely to be true: most differential expression in disease cohorts is a consequence rather than a cause of disease (https://www.nature.com/articles/s41467-021-25805-y). Indeed, the oxidative stress of AMD has known cellular responses including the upregulation of p53. The authors need to think carefully about how this impacts their framework. Can the theory say anything in this light? Simulations could also be designed to address robustness.

      A closely related issue is the DAG assumption of no cycles. This assumption is brought to bear because it required for much classical causal machinery, but is unrealistic in biology where feedback is pervasive. How robust is RCSP to (mild) violations of this assumption? Simulations would be a straightforward way to address this.

      The authors spend considerable effort arguing that technical sampling noise in X can effectively be ignored (at least in bulk). While the mathematical arguments here are reasonable, they miss the bigger picture point that the measured gene expression X can only ever be a noisy/biased proxy for the expression changes that caused disease: 1) Those events happened before the disease manifested, possibly early in development for some conditions like neurodevelopmental disorders. 2) bulk RNA-seq gives only an average across cell-types, whereas specific cell-types are likely "causal". 3) only a small sample, at a single time point, is typically available. Expression in other parts of the tissue and at different times will be variable.

      My remaining concerns are more minor.

      While there are connections to the omnigenic model, the latter is somewhat misrepresented. 1) The authors refer to the "core genes" of the omnigenic model as being at the end (longitudinally) of pathogenesis. The omnigenic model makes no statements about temporally ordering: in causal inference terminology the core genes are simply the direct cause of disease. 2) "Complex diseases often have an overwhelming number of causes, but the root causal genes may only represent a small subset implicating a more omnigenic than polygenic model" A key observation underlying the omnigenic model is that genetic heritability is spread throughout the genome (and somewhat concentrated near genes expressed in disease relevant cell types). This implies that (almost) all expressed genes, or their associated (e)SNPs, are "root causes".

      The claim that root causal genes would be good therapeutic targets feels unfounded. If these are highly variable across individuals then the choice of treatment becomes challenging. By contrast the causal effects may converge on core genes before impacting disease, so that intervening on the core genes might be preferable. The jury is still out on these questions, so the claim should at least be made hypothetical.

      The closest thing to a gold standard I believe we have for "root causal genes" is integration of molecular QTLs and GWAS, specifically coloc/MR. Here the "E" of RCSP are explicitly represented as SNPs. I don't know if there is good data for AMD but there certainly is for MS. The authors should assess the overlap with their results. Another orthogonal avenue would be to check whether the root causal genes change early in disease progression.

      The available perturb-seq datasets have limitations beyond on the control of the authors. 1) The set of genes that are perturbed. The authors address this by simply sub-setting their analysis to the intersection of genes represented in the perturbation and observational data. However, this may mean that a true ancestor of X is not modeled/perturbed, limiting the formal claims that can be made. Additionally, some proportion of genes that are nominally perturbed show little to no actual perturbation effect (for example, due to poor guide RNA choice) which will also lead to missing ancestors.

      The authors provide no mechanism for statistical inference/significance for their results at either the individual or aggregated level. While I am a proponent of using effect sizes more than p-values, there is still value in understanding how much signal is present relative to a reasonable null.

      I agree with the authors that age coming out of a "root cause" is potentially encouraging. However, it is also quite different in nature to expression, including being "measured" exactly. Will RCSP be biased towards variables that have lower measurement error?

      Finally, it's a stretch to call K562 cells "lymphoblasts". They are more myeloid than lymphoid.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Many thanks to the editors for the reviewing of the revised manuscript.

      We are very grateful to the Reviewers for their time and for the appreciation of the revision.

      We thank the Reviewer 3 for acknowledging the use of sulforhodamine B (SRB) fluorescence as a real-time readout of astrocyte volume dynamics. Experimental data in brain slices were provided to validate this approach.<br /> The incomplete matching of our observation with early reported data in cultured astrocytes (e.g., Solenov et al., AJP-Cell, 2004), might reflect certain of their properties differing from the slice/in vivo counterparts as discussed in the manuscript.<br /> The study (T.R. Murphy et al., Front Cell Neurosci., 2017) showed that AQP4 knockout increased astrocyte swelling extent in response to hypoosmotic solution in brain slices (Fig 9), and discussed '... AQP4 can provide an efficient efflux pathway for water to leave astrocytes.’ Correspondingly, our data suggest that AQP4 mediate astrocyte water efflux in basal conditions.<br /> We have discussed the study (Igarashi et al., NeuroReport 2013); our current data would help to understand the cellular mechanisms underlying the finding of Igarashi et al.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB-loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increases the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in the efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      We thank the reviewer for the insightful comments.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular end feet which all have different AQP4 expressions).

      Following the suggestion, we provide new data on the effect of AQP4 inhibition on spontaneous calcium signals in perivascular astrocyte end-feet. As shown now in Fig.S2, acute application of TGN020 induced Ca2+ oscillations in astrocyte end-feet regions where the GCaMP6 labeling lines the profile of the blood vessel. It is noted that on average, the strength of basal Ca2+ signals in the end-feet is higher than that observed across global astrocyte territories (4.65 ± 0.55 vs. 1.45 ± 0.79, p < 0.01), as does the effect of TGN (8.4 ± 0.62 vs. 6.35 ± 0.97, p < 0.05; Fig S2 vs. Fig 2B). This likely reflects the enrichment of AQP4 in astrocyte end-feet. We describe the data in Fig.S2, and on page 8, line 20 – 23.  

      We now use the transgenic line GLAST-GCaMP6 for cytosolic GCaMP6 expression in astrocytes. Spontaneous calcium signals, reflected by transient fluorescence rises, occur in discrete micro-domains whereas the basal GCaMP6 fluorescence in the soma is weak. In the present condition, it is difficult to unambiguously discriminate astrocyte soma from the highly intermingled processes. 

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for many of the other features of CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling).

      Regarding the features of the CSD swelling, we have performed new analysis to quantify the duration of swelling, speed of swelling and the recovery time from swelling in control condition and in the presence of TGN-020. The new analysis is now summarized in Fig. S5. Blocking AQP4 with TGN-020 increases the swelling speed, prolongs the duration of swelling and slows down the recovery from swelling, confirming our observation that acute inhibition of AQP4 water efflux facilitates astrocyte swelling while restrains shrinking. We describe the result on page 11, line 19-21. 

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water-selective. The authors here present important data showing that the application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with the glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4] have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still, a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as the basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow an influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly, AQP4-dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      We thank the reviewer in acknowledging the significance of our study and the functional implication in brain glymphatic system. We have now highlighted the mentioned studies as well as the potential implication glymphatic fluid circulation (page 4, line 9-10; page 5, line 1-3; and page 19, line 3-10). 

      Reviewer #2 (Public Review):

      Summary:

      The paper investigates the role of astrocyte-specific aquaporin-4 (AQP4) water channel in mediating water transport within the mouse brain and the impact of the channel on astrocyte and neuron signaling. Throughout various experiments including epifluorescence and light sheet microscopy in mouse brain slices, and fiber photometry or diffusion-weighted MRI in vivo, the researchers observe that acute inhibition of AQP4 leads to intracellular water accumulation and swelling in astrocytes. This swelling alters astrocyte calcium signaling and affects neighboring neuron populations. Furthermore, the study demonstrates that AQP4 regulates astrocyte volume, influencing mainly the dynamics of water efflux in response to osmotic challenges or associated with cortical spreading depolarization. The findings suggest that AQP4-mediated water efflux plays a crucial role in maintaining brain homeostasis, and indicates the main role of AQP4 in this mechanism. However authors highlight that the report sheds light on the mechanisms by which astrocyte aquaporin contributes to the water environment in the brain parenchyma, the mechanism underlying these effects remains unclear and not investigated. The manuscript requires revision.

      Strengths:

      The paper elucidates the role of the astrocytic aquaporin-4 (AQP4) channel in brain water transport, its impact on water homeostasis, and signaling in the brain parenchyma. In its idea, the paper follows a set of complimentary experiments combining various ex vivo and in vivo techniques from microscopy to magnetic resonance imaging. The research is valuable, confirms previous findings, and provides novel insights into the effect of acute blockage of the AQP4 channel using TGN-020.

      We thank the reviewer for the constructive comments.

      Weaknesses:

      Despite the employed interdisciplinary approach, the quality of the manuscript provides doubts regarding the significance of the findings and hinders the novelty claimed by the authors. The paper lacks a comprehensive exploration or mention of the underlying molecular mechanisms driving the observed effects of astrocytic aquaporin-4 (AQP4) channel inhibition on brain water transport and brain signaling dynamics. The scientific background is not very well prepared in the introduction and discussion sections. The important or latest reports from the field are missing or incompletely cited and missconcluded. There are several citations to original works missing, which would clarify certain conclusions. This especially refers to the basis of the glymphatic system concept and recently published reports of similar content. The usage of TGN-020, instead of i.e. available AER-270(271) AQP4 blocker, is not explained. While employing various experimental techniques adds depth to the findings, some reasoning behind the employed techniques - especially regarding MRI - is not clear or seemingly inaccurate. Most of the time the number of subjects examined is lacking or mentioned only roughly within the figure captions, and there are lacking or wrongly applied statistical tests, that limit assessment and reproducibility of the results. In some cases, it seems that two different statistical tests were used for the same or linked type of data, so the results are contradictory even though appear as not likely - based on the figures. Addressing these limitations could strengthen the paper's impact and utility within the field of neuroscience, however, it also seems that supplementary experiments are required to improve the report.

      The current data hint at a tonic water efflux from astrocyte AQP4 in physiological condition, which helps to understand brain water homeostasis and the functional implication for the glymphatic system. The underlying molecular and cellular mechanisms appear multifaceted and functionally interconnected, as discussed (page 14 line 8 –page 15, line 3). We agree that a comprehensive exploration will further advance our understanding.

      The introduction and discussion are now strengthened by incorporating the important advances in glymphatic system while highlighting the relevant studies. 

      The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies including the use of heterologous expression system and the AQP4 KO mice. The validation of AER-270(271, the water soluble prodrug) using AQP4 KO mice is reported recently (Giannetto et al., 2024). AER-271 was noted to impact brain water ADC (apparent diffusion coefficient evaluated by diffusion-weighted MRI) in AQP4 KO mice ~75 min after the drug application (Giannetto et al., 2024). This likely reflects that AER270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ) whose inhibition could reduce CNS water content independent of AQP4 targeting (Salman et al., 2022). In addition, the inhibition efficiency of AER-270(271) seems lower than TGN-020 (Farr et al., 2019; Giannetto et al., 2024; Huber et al., 2009; Salman et al., 2022). We have now supplemented this information in the manuscript (page 7, line 1-6 and page15, line 7-17).

      The description on the DW-MRI is now updated (page 4, line 10-14). 

      We also performed new experiments and data analysis as described in a point-to-point manner below in the section ‘Recommendations For The Authors’.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine fluorescence as the proxy for cell volume dynamics. Using this approach, they perform a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume in response to AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key finding is that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume regulation after spreading depolarizations. Additionally, systemic application of TGN-020 produced changes in diffusion-weighted MRI signal, which the authors interpret as cellular swelling. This study is perceived as potentially significant. However, several technical caveats should be strongly considered and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically elegant study, in which the authors employed a number of complementary ex vivo and in vivo techniques to explore functional outcomes of aquaporin inhibition. The presented data are potentially highly significant (but see below for caveats and questions related to data interpretation).

      (2) The authors go beyond measuring cell volume homeostasis and probe for the functional significance of AQP4 inhibition by monitoring Ca2+ signaling in neurons and astrocytes (GCaMP6 assay).

      (3) Spreading depolarizations represent a physiologically relevant model of cellular swelling. The authors use ChR2 optogenetics to trigger spreading depolarizations. This is a highly appropriate and much-appreciated approach.

      We thank the reviewer for the effort in evaluating our work.

      Weaknesses:

      (1) The main weakness of this study is that all major conclusions are based on the use of one pharmacological compound. In the opinion of this reviewer, the effects of TGN-020 are not consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically: Genetic deletion of AQP4 in astrocytes reduces plasmalemmal water permeability by ~two-three-fold (when measured a 37oC, Solenov et al., AJP-Cell, 2004). This is a significant difference, but it is thought to have limited/no impact on water distribution. Astrocytic volume and the degree of anisosmotic swelling/shrinkage are unchanged because the water permeability of the AQP4null astrocytes remains high. This has been discussed at length in many publications (e.g., MacAulay et al., Neuroscience, 2004; MacAulay, Nat Rev Neurosci, 2021) and is acknowledged by Solenov and Verkman (2004).

      Keeping this limitation in mind, it is important to validate astrocytic cell volume changes using an independent method of cell volume reconstruction (diameter of sulforhodamine-labeled cell bodies? 3D reconstruction of EGFP-tagged cells? Else?)

      Solenov and coll. used the calcein quenching assay and KO mice demonstrating AQP4 as a functional water channel in cultured astrocytes (Solenov et al., 2004). AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data from the pharmacological acute blocking. This discrepancy may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices or in vivo results as suggested previously (Risher et al., 2009). 

      Soma diameter might be an indicator of cell volume change, yet it is challenging with our current fluorescence imaging method that is diffraction-limited and insufficient to clearly resolve the border of the soma in situ. In addition, the lateral diameter of cell bodies may not faithfully reflect the volume changes that can occur in all three dimensions. Rapid 3D imaging of astrocyte volume dynamics with sufficient high Z-axis resolution appears difficult with our present tools. 

      We have now accordingly updated the discussion with relevant literatures being cited (page 17 line 14 – page 18, line 3).

      (2) TGN-020 produces many effects on the brain, with some but not all of the observed phenomena sensitive to the genetic deletion of AQP4. In the context of this work, it is important to note that TGN020 does not completely inhibit AQP4 (70% maximal inhibition in the original oocyte study by Huber et al., Bioorg Med Chem, 2009). Thus, besides not knowing TGN-020 levels inside the brain, even

      "maximal" AQP4 inhibition would not be expected to dramatically affect water permeability in astrocytes.

      This caveat may be addressed through experiments using local delivery of structurally unrelated AQP4 blockers, or, preferably, AQP4 KO mice.

      It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged. We mention this now on page 15, line 7-9 and 14-17.

      We agree that local delivery of an alternative blocker will provide additional information. Meanwhile, local delivery requires the stereotaxic implantation of cannula, which would cause inflammations to surrounding astrocytes (and neurons). The recently introduced AQP4 blocker AER-270(271) has received attention that it influences brain water dynamics (ADC in DW-MRI) in AQP4 KO mice (Giannetto et al., 2024), recalling that AER-270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ). This pathway can potentially perturb CNS water content and influence brain fluid circulation, in an AQP4independent manner (Salman et al., 2022). The inhibition efficiency on mouse AQP4 of AER-270 (~20%, Farr et al., 2019; Salman et al., 2022) appears lower than TGN-020 (~70%, Huber et al., 2009).

      We chose to use the pharmacological compound to achieve acute blocking of AQP4 thereby avoiding the chronic genetics-caused alterations in brain structural, functional and water homeostasis. Multiple lines of evidence including the recent study (Gomolka et al., 2023), have shown that AQP4 KO mice alters brain water content, extracellular space and cellular structures, which raises concerns to use the transgenic mouse to pinpoint the physiological functions of the AQP4 water channel. 

      We have now mentioned the concerns on AQP4 pharmacology by supplementing additional literatures in the field (page 15, line 8-18). 

      (3) This reviewer thinks that the ADC signal changes in Figure 5 may be unrelated to cellular swelling. Instead, they may be a result of the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes in water fluxes across pia matter which is highly enriched in AQP4. To amplify this concern, AQP4 KO brains have increased water mobility due to enlarged interstitial spaces, rather than swollen astrocytes (RS Gomolka, eLife, 2023). Overall, the caveats of interpreting DW-MRI signal deserve strong consideration.

      The previous observation show that TGN-020 increases regional cerebral blood flow in wild-type mice but not in AQP4 KO mice (Igarashi et al., 2013). Our current data provide a possible mechanism explanation that TGN-020 blocking of astrocyte AQP4 causes calcium rises that may lead to vasodilation as suggested previously (Cauli and Hamel, 2018). We now add updates to the discussion on page 15, line 3-7.

      We are in line with the reviewer regarding the structural deviations observed with the AQP4 KO mice

      (Gomolka et al., 2023), now mentioned on page 19, line 3-5. Following the Reviewer’s suggestion, we have also updated the interpretation of the DW-MRI signal and point that in addition to being related to the astrocyte swelling, the ADC signal changes may also be caused by indirect mechanisms, such as the transient upregulation of other water-permeable pathways in compensating AQP4 blocking. We now describe this alternative interpretation and the caveats of the DW-MRI signals (page 20, line 1-8). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Private recommendations

      My more broad experimental suggestions are in the "weaknesses" section. Some minor points that would improve the manuscript are included below:

      (1) A more detailed explanation for why SRB fluorescence reflects the astrocyte volume changes, whereas typical intracellular GFP does not.

      As an engineered fluorescence protein, the GFP has been used to tag specific type of cells. Meanwhile, as a relatively big protein (MW, 26.9 kDa), the diffusion rate of EGFP is expected to be much less than SRB, a small chemical dye (MW, 558.7 Da). Also, the IP injection of SRB enables geneticsless labeling of brain astrocytes, so to avoid the influence of protein overexpression on astrocyte volume and water transport responses. We have now stated this point in the manuscript (page 13, line 21 – page 14, line 4).

      (2) Figure 1 panel B should have clear labels on the figure and a description in the legend to delineate which part of the panel refers to hyper- or hypo-osmotic treatment.

      We have now updated the figure and the legend.  

      (3) For Figure 2, what is the rationale for analyzing the calcium signaling data between the cell types differently?

      We analyzed calcium micro-domains for astrocytes as their spontaneous signals occur mainly in discrete micro-domains (Shigetomi et al., 2013). While for neurons, we performed global analysis by calculating the mean fluorescence of imaging field of view, because calcium signal changes were only observed at global level rather than in micro-domains. This information is now included (page 24, line1820).

      (4) For Figure 3, the authors mention that TGN-020 likely caused swelling prior to the hypotonic solution administration. Do they have any measurements from these experiments prior to the TGN-020 application to use as a "true baseline" volume?

      The current method detects the relative changes in astrocyte volume (i.e., transmembrane water transport), which nevertheless is blind to the absolute volume value. We have no readout on baseline volumes.  

      (5) For Figures 3 and 4, did the authors see any evidence for regulatory volume decrease? And is this impaired by TGN-020? It is a well-characterized phenomenon that astrocytes will open mechanosensitive channels to extrude ions during hypo-osmotic induced swelling. This process is dependent on AQP4 and calcium signaling [5]

      Mola and coll. provided important results demonstrating the role of AQP4 in astrocyte volume regulation (Mola et al., 2016). In the present study in acute brain slices, when we applied hypotonic solution to induce astrocyte swelling, our protocol did not reveal rapid regulatory volume decrease (e.g., Fig. 3D). When we followed the volume changes of SRB-labeled astrocytes during optogenetically induced CSD, we observed the phase of volume decrease following the transient swelling (Fig. 4F), where the peak amplitude and the degree of recovery were both reduced by inhibiting AQP4 with TGN020. These data imply that regulatory astrocyte volume decrease may occur in specific conditions, which intriguingly has been suggested to be absent in brain slices and in vivo (e.g., Risher et al., 2009). We have not specifically investigated this phenomenon, and now briefly discuss this point on page18 line 6-14.

      (6) Figure 5 box plots do not show all data points, could the authors modify to make these plots show all the animals, or edit the legend to clarify what is plotted?

      We have now updated the plot and the legend. This plot is from all animals (n = 7 per condition).

      (7) pg. 9 line 6, there is a sentence that seems incomplete or otherwise unfinished. "We first followed the evoked water efflux and shrinking induced by hypertonic solution while."

      Fixed (now, page 9 line 17-18). 

      (8)  During the discussion on pg 13 line 11, it may be more clear to describe this as the cotransport of water into the cells with ions/metabolites as reviewed by Macaulay 2021 [6].

      We agree; the text is modified following this suggestion (now page14, line 12-13).  

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      (5) Mola, M., et al., The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia, 2016. 64(1).

      (6) MacAulay, N., Molecular mechanisms of brain water transport. Nat Rev Neurosci, 2021. 22(6): p. 326-344.

      We thank the reviewer. These important literatures are now supplemented to the manuscript together with the corresponding revisions.

      Reviewer #2 (Recommendations For The Authors):

      In its concept, the paper is interesting and provides additional value - however, it requires revision.

      Below, I provide the following remarks for the following sections/ pages/lines:

      ABSTRACT/page 2 (remarks here refer to the rest of the manuscript, where these sentences are repeated):

      - It seems that the 'homeostasis' provides not only physical protection, but also determines the diffusion of chemical molecules...' Please correct the sentence as it is grammatically incorrect.

      It is now corrected (page 2, line 1).

      - The term 'tonic water' is not clear. I understand, after reading the paper, that it is about tonicity of the solutes injected into the mouse.

      We use the term ‘tonic’ to indicate that in basal conditions, a constant water efflux occurs through the APQ4 channel.

      - 'tonic aquaporin water efflux maintains volume equilibrium' - I believe it is about maintaining volume and osmotic equilibrium?

      This description is now refined (now page 2, line 10).

      - It is not clear whether the tonic water outflow refers to the cellular level or outflow from the brain parenchyma (i.e., glymphatic efflux)

      It refers to the cellular level. 

      INTRODUCTION/page 3:

      - 'clearance of waste molecules from the brain as described in the glymphatic system' - The original papers describing the phenomena are not cited: Iliff et al. 2012, 2013, Mestre et al. 2018, as well as reviews by Nedergaard et al.

      Indeed. We have now cited these key literatures (now page 4, line 10).

      - 'brain water diffusion is the basis for diffusion-weighted magnetic resonance imaging (DW-MRI)' - The statement is wrong. it is the mobility of the water protons that DWI is based on, but not the diffusion of molecules in the brain. This should be clarified and based on the DW-MRI principle and the original works by Le Bihan from 1986, 1988, or 2015.

      This sentence is now updated (page 4, line10-14).

      - Similarly, I suggest correcting or removing the citations and the sentence part regarding the clinical use of DWI, as it has no value here. Instead, it would be worth mentioning what actually ADC reflects as a computational score, and what were the results from previous studies assessing glymphatic systems using DWI. This is especially important when considering the mislocalization of the AQP4 channel.

      We now states recent studies using DW-MRI to evaluate glymphatic systems (page 4, line16-17).  

      - 'In the brain, AQP4 is predominantly expressed in astrocytes'-please review the citations. I suggest reading the work by Nielsen 1997, Nagelhus 2013, Wolburg 2011, and Li and Wang from 2017. To my best knowledge, in the brain AQP4 is exclusively expressed in astrocytes.

      Thanks for the reviewer. It is described that while enriched in astrocytes, AQP4 is also expressed in ependymal cells lining the ventricles (e.g., (Mayo et al., 2023; Verkman et al., 2006)). ‘predominantly’ is now removed (page 4, line 21).

      - The conclusion: ' Our finding suggests that aquaporin acts as a water export route in astrocytes in physiological conditions, so as to counterbalance the constitutive intracellular water accumulation caused by constant transmitter and ion uptake, as well as the cytoplasmic metabolism processes. This mechanism hence plays a necessary role in maintaining water equilibrium in astrocytes, thereby brain water homeostasis' seems to be slightly beyond the actual findings in the paper. I suggest clarifying according to the described phenomena.

      We have now refined the conclusion sticking to the experimental observations (page 5, line16-18).

      - The introduction lacks important information on existing AQP4 blockers and their effects, pros and cons on why to use TGN-020. Among others, I would refer to recent work by Giannetto et al 2024, as well as previous work of Mestre et al. 2018 and Gomolka et al. 2023.

      We initiated the study by using TGN-020 as an AQP4 blocker because it has been validated by wide range of ex vivo and in vivo studies as documented in the text (page 7, line 1-6). We also update discussions on the recent advances in validating the AQP4 blocker AER-270(271) while citing the relevant studies (page 15, line 7-17).  

      RESULTS:

      - Page 5, lines 19-20: '...transport, we performed fluorescence intensity translated (FIT) imaging.' - this term was never introduced in the methods so it is difficult for the reader to understand it at first sight. -'To this end,' - it is not clear which action refers to 'this'. (is it about previous works or the moment that the brain samples were ready for imaging? Please clarify, as it is only starting to be clear after fully reading the methods.

      We now refine the description give the principle of our imaging method first, then explain the technical steps. To avoid ambiguity, the term ‘To this end’ is removed. The updated text is now on page 6, line 1-3.  

      - From page 6 onwards - all references to Figures lack information to which part of the figure subpanel the information refers (top/middle bottom or left/middle/right).

      We apologize. The complementary indication is now added for figure citations when applicable.  

      - 'whereas water export and astrocyte shrinking upon hyperosmotic manipulation increased astrocyte fluorescence (Figure 1B). Hence, FIT imaging enables real-time recording of astrocyte transmembrane water transport and volume dynamics.' - this part seems to be undescribed or not clear in the methods.

      We have now refined this description (page 6, line 19-20).

      - Page 6, lines 17-22: TGN-020. In addition to the above, I suggest familiarizing also with the following works by Igarashi 2011. doi: 10.1007/s10072-010-0431-1, and by Sun 2022. doi: 10.3389/fimmu.2022.870029.

      These studies are now cited (page 7, line 3-4).

      - Page 7: ' AQP4 is a bidirectional channel facilitating... ' - AQP4 water channel is known as the path of least resistance for water transfer, please see Manley, Nature Medicine, 2000 and Papadopoulos, Faseb J, 2004.

      This sentence is now updated (page 7, line 12-13).

      - ' astrocyte AQP4 by TGN-020 caused a gradual decrease in SRB fluorescence intensity, indicating an intracellular water accumulation' - tissue slice experiment is a very valuable method. However it seems right, the experiment does not comment on the cell swelling that may occur just due to or as a superposition of tissue deterioration and the effect of TGN-020. The AQP4 channel is blocked, and the influx of water into astrocytes should be also blocked. Thus, can swelling be also a part of another mechanism, as it was also observed in the control group? I suggest this should be addressed thoroughly.

      We performed this experiment in acute brain slices to well control the pharmacological environment and gain spatial-temporal information. Post slicing, the brain slices recovered > 1hr prior to recording, so that the slices were in a stable state before TGN-020 application as evidenced by the stable baseline. The constant decrease in the control trace is due to photobleaching which did not change its curve tendency in response to vehicle. TGN-020, in contrast, caused a down-ward change suggesting intracellular water accumulation and swelling. 

      The experiment was performed at basal condition without active water influx; a decrease in SRB fluorescence hints astrocyteintracellular water buildup. This result shows that in basal condition, astrocyte aquaporin mediates a constant (i.e., tonic) water efflux; its blocking causes intracellular water accumulation and swelling. 

      We have accordingly updated the description of this part (page 7, line 15-20).

      - From the Figure 1 legend: Only 4 mice were subjected to the experiment, and only 1 mouse as a control. I suggest expanding the experiment and performing statistics including two-way ANOVA for data in panels B, C, and D, as no results of statistical tests confirm the significance of the findings provided.

      The panel B confirms that cytosolic SRB fluorescence displays increasing tendency upon water efflux and volume shrinking, and vice versa. As for the panel C, the number of mice is now indicated. Also, the downward change in the SRB fluorescence was now respectively calculated for the phases prior and post to TGN (and vehicle) application, and this panel is accordingly updated. TGN-020 induced a declining in astrocyte SRB fluorescence, which is validated by t-test performed in MATLAB. To clarify, we now add cross-link lines to indicate statistical significance between the corresponding groups (Fig 1C, middle). As for panel D, we calculated the SRB fluorescence change (decrease) relative to the photobleaching tendency illustrated by the dotted line. The significance was also validated by t-test performed in MATLAB.  

      - Figure 1: Please correct the figure - pictures in panel A are low quality and do not support the specificity of SRB for astrocytes. Panels B-D are easier to understand if plotted as normal X/Y charts with associated statistical findings. Some drawings are cut or not aligned.

      In GFAP-EGFP transgenic, astrocytes are labeled by EGFP. SRB labeling (red fluorescence) shows colocalization with EGFP-positive astrocytes, meanwhile not all EGFP-positive astrocytes are labeled by SRB. The PDF conversion procedure during the submission may also somehow have compromised image quality. We have tried to update and align the figure panels.  

      - Page 12: ' TGN-020 increased basal water diffusion within multiple regions including the cortex,

      hippocampus and the striatum in a heterogeneous manner (Figure 5C).'

      This sentence is updated now (page 12, line 12 – page13, line 2). It reads ‘The representative images reveal the enough image quality to calculate the ADC, which allow us to examine the effect of TGN-020 on water diffusion rate in multiple regions (Fig. 5C).’

      - The expression of AQP4 within the brain parenchyma is known to be heterogenous. Please familiarize yourself with works by Hubbard 2015, Mestre 2018, and Gomolka 2023. A correlation between ADC score and AQP4 expression ROI-wise would be useful, but it is not substantial to conduct this experiment.

      We thank the reviewer. This point is stressed on page 19, line 12-14.

      DISCUSSION:

      - Most of the issues are commented on above, so I suggest following the changes applied earlier. -Page 16: 'We show by DW-MRI that water transport by astrocyte aquaporin is critical for brain water homeostasis.' This statement is not clear and does not refer to the actual impact of the findings. DWI is allowed only to verify the changes of ADC fter the application of TGN-020. I suggest commenting on the recent report by Giannetto 2024 here.

      This sentence is now refined (page 19, line 1-2), followed by the updates commenting on the recent studies employing DW-MRI to evaluate brain fluid transport, including the work of (Giannetto et al., 2024) (page 19, line 3-10). 

      METHODS:

      - Page 18: no total number of mice included in all experiments is provided, as well as no clearly stated number of mice used in each experiment. Please correct.

      We have now double checked the number of the mice for the data presented and updated the figure legends accordingly (e.g., updates in legends fig1, fig5, etc).

      -  Page 18, line 7: 'Axscience' is not a producer of Isoflurane, but a company offering help with scientific manuscript writing. If this company's help was used, it should be stated in the acknowledgments section. Reference to ISOVET should be moved from line 15 to line 7.

      We apologize. We did not use external writing help, and now have removed the ‘Axcience’. The Isoflurane was under the mark ‘ISOVET’ from ‘Piramal’. This info is now moved up (page 21, line 11). 

      - Page 18, line 9: ' modified artificial cerebrospinal fluid (aCSF)'. Additional information on the reason for the modified aCSF would be useful for the reader.

      In this modified solution, the concentration of depolarizing ions (Na+, Ca2+) was reduced to lower the potential excitotoxicity during the tissue dissection (i.e., injury to the brain) for preparing the brain slices. Extra sucrose was added to balance the solution osmolarity. This solution has been used previously for the dissection and the slicing steps in adult mice (Jiang et al., 2016). We now add this justification in the text and quote the relevant reference (page 21, line14-16). 

      - Page 19, line 6: a reasoning for using Tamoxifen would be helpful for the reader.

      The Glast-CreERT2 is an inducible conditional mouse line that expresses Cre recombinase selectively in astrocytes upon tamoxifen injection. We now add this information in the text (page 22, line 10-11). 

      - Line 8 - 'Sigma'

      Fixed.

      - Line 7/8: It is not clear if ethanol is of 10% solution or if proportions of ethanol+tamoxifen to oil were of 1:9. The reasoning for each performed step is missing.

      We have now clarified the procedure (page 22, line 11-15).

      - Line 10: '/' means 'or'?

      Here, we mean the bigenic mice resulting from the crossing of the heterozygous Cre-dependent GCaMP6f and Glast-CreERT2 mouse lines. We now modify it to ‘Glast-CreERT2::Ai95GCaMP6f//WT’, in consistence with the presentation of other mouse lines in our manuscript (page 22, line 16).

      - Lines 22-23: being in-line with legislation was already stated at the beginning of the Methods so I suggest combining for clearance.

      Done. 

      - Page 21, line 4: it is good to mention which printer was used, but it would be worth mentioning the material the chamber was printed from - was it ABS?

      Yes. We add this info in the text now (page 24, line 5).

      - Line 9 -'PI' requires spelling out.

      It is ‘Physik Instrumente’, now added (page 24, line 10).

      - Line 11-12: What is the reason for background subtraction - clearer delineation of astrocytes/ increasing SNR in post-processing, or because SRB signal was also visible and changing in the background over time? Was the background removed in each frame independently (how many frames)? How long was the time-lapse and was the F0 frame considered as the first frame acquired? The background signal should be also measured and plotted alongside the astrocytic signal, as a reference (Figure 1). This should be clarified so that steps are to be followed easily.

      We sought to follow the temporal changes in SRB fluorescence signal. The acquired fluorescent images contain not only the SRB signals, but also the background signals consisting of for instance the biological tissue autofluorescence, digital camera background noise and the leak light sources from the environments. The value of the background signal was estimated by the mean fluorescence of peripheral cell-free subregions (15 × 15 µm²) and removed from all frames of time-lapse image stack. The traces shown in the figures reflect the full lengths of the time-lapse recordings. F0 was identified as the mean value of the 10 data points immediately preceding the detected fluorescence changes. The text is now updated (page 24 line 21 - page 25 line 5).

      - Line 15: Was astrocyte image delineation performed manually or automatically? Where was the center of the region considered in the reference to the astrocyte image? It would be good to see the regions delineated for reference.

      Astrocytes labeled by SRB were delineated manually with the soma taken as the center of the region of interest. We now exemplify the delineated region in Fig 1A, bottom.

      - Page 22, line 2: 'x4 objective'.

      Added (now, page 25, line 16). 

      - Line 3: 'barrels' - reference to publication or the explanation missing.

      The relevant reference is now added on barrel cortex (Erzurumlu and Gaspar, 2020) (page 25, line 19-20). 

      - Line 19: were the coordinates referred to = bregma?

      Yes. This info is now added (page 26, line 12). 

      - Line 20: was the habituation performed directly at the acquisition date? It is rather difficult to say that it was a habituation, but rather acute imaging. I suggest correcting, that mice were allowed to familiarize themselves with the setup for 30 minutes prior to the imaging start.

      In this context, although it is a very nice idea and experiment, the influence of acute stress in animals familiar with the setup only from the day of acquisition is difficult to avoid. It is a major concern, especially when considering norepinephrine as a master driver of neuronal and vascular activity through the brain, and strong activation of the hypothalamic-adrenal axis in response to acute stress. It is well known, that the response of monoamines is reduced in animals subjected to chronic v.s acute stress, but still larger than that if the stressor is absent.

      Major remark: The animals should, preferably, be imaged at least after 3 days of habituation based on existing knowledge. I suggest exploring the topic of the importance of habituation. It is difficult though, to objectively review these findings without considering stress and associated changes in vascular dynamics.

      Many thanks for the reviewer to help to precise this information. The text is accordingly updated to describe the experiment (now page 26, line 14). 

      - Page 23, line 17: number of animals included in experiments missing.

      The number of animals is added in Methods (page 27, line 12) and indicated in the legend of Figure 5. 

      - Line 18/19: were the respiratory effects observed after injection of saline or TGN-020? Since DWI was performed, the exclusion of perfusive flow on ADC is impossible.

      I suggest an additional experiment in n=3 animals per group, verifying the HR (and if possible BP) response after injection of TGN-020 and saline in mice.

      The respiratory rate has been recorded. We added the averaged respiratory rate before and after injection of TGN-020 or saline (now, Fig. S6; page 13, line 5-6).

      - Line 22: Please, provide the model of the scanner, the model of the cryoprobe, as well as the model of the gradient coil used, otherwise it is difficult to assess or repeat these experiments.

      We have now added the information of MRI system in Methods section (page 27, line17-21).

      - Page 24: line 3/4: although the achieved spatial resolution of DWI was good and slightly lower than desired and achievable due to limitations of the method itself as well as cryoprobe, it is acceptable for EPI in mice.

      Still, there is no direct explanation provided on the reasoning for using surface instead of volumetric coil, as well as on assuming an anisotropic environment (6 diffusion directions) for DWI measurements. This is especially doubtful if such a long echo-time was used alongside lower-thanpossible spatial resolution. Longer echo time would lower the SNR of the depicted signal but also would favor the depiction of signal from slow-moving protons and larger water pools. On the other hand, only 3 b-values were used, which is the minimum for ADC measurements, while a good research protocol could encompass at least 5 to increase the accuracy of ADC estimation and avoid undersampling between 250 and 1800 b-values. What was the reason for choosing this particular set of b-values and not 50, 600, and 2000? Besides, gradient duration time was optimally chosen, however, I have concerns about the decision for such a long gradient separation times.

      If the protocol could have been better optimized, the assessment could have been also performed in respiratory-gated mode, allowing minimization of the effects of one of the glymphatic system driving forces.

      Thus, I suggest commenting on these issues.

      We chose the cryoprobe to increase the signal-to-noise ratio (SNR) in DW-MRI with long echo-time and high b-value. The volume coil has a more homogeneous SNR in the whole brain rather than the cryoprobe, but SNR should be reduced compared with cryoprobe. We confirmed that, even at the ventral part of the brain, the image quality of DW-MRI images was enough to investigate the ADC with cryoprobe (Fig. 5B-C). This is mentioned now in Methods (page 27, line 17-21).

      We performed DW-MRI scanning for 5 min at each time-point using the condition of anisotropic resolution and 3 b-values, to investigate the time-course of ADC change following the injection of TGN020. Because the effect of TGN-020 appears about dozen of minutes post the injection (Igarashi et al., 2011), fast DW-MRI scanning is required. If isotropic DW-MRI with lower echo-time and more direction is used, longer scan time at each time point is required, maybe more than 1h. We agree that three bvalues is minimum to calculate the ADC and more b-values help to increase the accuracy. However, to achieve the temporal resolution so as to better catch the change of water diffusion, we have decided to use the minimum b-values. The previous study also validates the enough accuracy of DW-MRI with three b-values (Ashoor et al., 2019). Furthermore, previous study that used long diffusion time (> 20 ms) and long echo time (40 ms) shows the good mean diffusivity (Aggarwal et al., 2020), supporting that our protocol is enough to investigate the ADC. We have now updated the description (page 28 line 5-9).  The reason why we choose the b = 250 and 1800 s/mm² is that 2000 s/mm² seems too high to get the good quality of image. In the previous study, we have optimized that ADC is measurable with b = 0, 250, and 1800 s/mm² (Debacker et al., 2020). 

      - Page 24, line 7: What was the post-processing applied for images acquired over 70 minutes? Did it consider motion-correction, co-registration, or drift-correction crucial to avoid pitfalls and mismatches in concluding data?

      The motion correction and co-registration were explained in Methods (page 28, line 12-14).

      Also, were these trace-weighted images or magnitude images acquired since DTI software was used for processing - while ADC fitting could be reliably done in Matlab, Python, or other software. Thus, was DSI software considering all 3 b-values or just used 0 and 1800 for the calculation of mean diffusivity for tractography (as ADC). The details should be explained.

      DSIstudio was used with all three b values (b = 0, 250, and 1800 s/mm²) to calculate the ADC. We added the description in Methods (page 28, line 16-18).

      To make sure that the results are not affected by the MR hardware, I suggest performing 3 control measurements in a standard water phantom, and presenting the results alongside the main findings.

      Thanks for this suggestion. We have performed new experiments and now added the control measurement with three phantoms, that is water, undecane, and dodecane. These new data are summarized now in Fig. S7, showing the stability of ADC throughout the 70 min scanning. We have updated the description on Method part (page 28, line 9-11) and on the Results (page 13, line 6-8).  

      - Line 13: were the ROI defined manually or just depicted from previously co-registered Allen Brain atlas?

      The ROIs of the cortex, the hippocampus, and the striatum were depicted with reference to Allen mouse brain atlas (https://scalablebrainatlas.incf.org/mouse/ABA12). This is explained in Methods (page 28, line 14-16).

      - Line 10: why the average from 1st and 2nd ADC was not considered, since it would reduce the influence of noise on the estimation of baseline ADC?

      We are sorry that it was a typo. The baseline was the average between 1st and 2nd ADC. We corrected the description (page 28, line 20).

      STATISTIC:

      Which type of t-test - paired/unpaired/two samples was used and why? Mann-Whitney U-tets are used as a substitution for parametric t-tests when the data are either non-parametric or assuming normal distribution is not possible. In which case Bonferroni's-Holm correction was used? - I couldn't find any mention of any multiple-group analysis followed by multiple comparisons. Each section of the manuscript should have a description of how the quantitative data were treated and in which aim. I suggest carefully correcting all figures accordingly, and following the remarks given to the Figure 1.

      We used unpaired t-test for data obtained from samples of different conditions. Indeed, MannWhitney U-test is used when the data are non-parametric deviating from normal distributions.  Bonferroni-Holm correction was used for multiple comparisons (e.g., Fig. 4D-E).

      Reviewer #3 (Recommendations For The Authors):

      I think that the following statement is insufficient: "The authors commit to share data, documentation, and code used in analysis". My understanding is eLife expects that all key data to be provided in a supplement.

      We thank the reviewer; we follow the publication guidelines of eLife. 

      References

      Aggarwal, M., Smith, M.D., and Calabresi, P.A. (2020). Diffusion-time dependence of diffusional kurtosis in the mouse brain. Magn Reson Med 84, 1564-1578.

      Ashoor, M., Khorshidi, A., and Sarkhosh, L. (2019). Estimation of microvascular capillary physical parameters using MRI assuming a pseudo liquid drop as model of fluid exchange on the cellular level. Rep Pract Oncol Radiother 24, 3-11.

      Cauli, B., and Hamel, E. (2018). Brain Perfusion and Astrocytes. Trends in neurosciences 41, 409-413.

      Debacker, C., Djemai, B., Ciobanu, L., Tsurugizawa, T., and Le Bihan, D. (2020). Diffusion MRI reveals in vivo and non-invasively changes in astrocyte function induced by an aquaporin-4 inhibitor. PLoS One 15, e0229702.

      Erzurumlu, R.S., and Gaspar, P. (2020). How the Barrel Cortex Became a Working Model for Developmental Plasticity: A Historical Perspective. J Neurosci 40, 6460-6473.

      Farr, G.W., Hall, C.H., Farr, S.M., Wade, R., Detzel, J.M., Adams, A.G., Buch, J.M., Beahm, D.L., Flask, C.A., Xu, K., et al. (2019). Functionalized Phenylbenzamides Inhibit Aquaporin-4 Reducing Cerebral Edema and Improving Outcome in Two Models of CNS Injury. Neuroscience 404, 484-498.

      Giannetto, M.J., Gomolka, R.S., Gahn-Martinez, D., Newbold, E.J., Bork, P.A.R., Chang, E., Gresser, M., Thompson, T., Mori, Y., and Nedergaard, M. (2024). Glymphatic fluid transport is suppressed by the aquaporin-4 inhibitor AER-271. Glia.

      Gomolka, R.S., Hablitz, L.M., Mestre, H., Giannetto, M., Du, T., Hauglund, N.L., Xie, L., Peng, W., Martinez, P.M., Nedergaard, M., et al. (2023). Loss of aquaporin-4 results in glymphatic system dysfunction via brain-wide interstitial fluid stagnation. eLife 12.

      Huber, V.J., Tsujita, M., and Nakada, T. (2009). Identification of aquaporin 4 inhibitors using in vitro and in silico methods. Bioorg Med Chem 17, 411-417.

      Igarashi, H., Huber, V.J., Tsujita, M., and Nakada, T. (2011). Pretreatment with a novel aquaporin 4 inhibitor, TGN-020, significantly reduces ischemic cerebral edema. Neurol Sci 32, 113-116.

      Igarashi, H., Tsujita, M., Suzuki, Y., Kwee, I.L., and Nakada, T. (2013). Inhibition of aquaporin-4 significantly increases regional cerebral blood flow. Neuroreport 24, 324-328.

      Jiang, R., Diaz-Castro, B., Looger, L.L., and Khakh, B.S. (2016). Dysfunctional Calcium and Glutamate Signaling in Striatal Astrocytes from Huntington's Disease Model Mice. J Neurosci 36, 3453-3470.

      Mayo, F., Gonzalez-Vinceiro, L., Hiraldo-Gonzalez, L., Calle-Castillejo, C., Morales-Alvarez, S., Ramirez-Lorca, R., and Echevarria, M. (2023). Aquaporin-4 Expression Switches from White to Gray Matter Regions during Postnatal Development of the Central Nervous System. Int J Mol Sci 24.

      Mola, M.G., Sparaneo, A., Gargano, C.D., Spray, D.C., Svelto, M., Frigeri, A., Scemes, E., and Nicchia, G.P. (2016). The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia 64, 139-154.

      Risher, W.C., Andrew, R.D., and Kirov, S.A. (2009). Real-time passive volume responses of astrocytes to acute osmotic and ischemic stress in cortical slices and in vivo revealed by two-photon microscopy. Glia 57, 207-221.

      Salman, M.M., Kitchen, P., Yool, A.J., and Bill, R.M. (2022). Recent breakthroughs and future directions in drugging aquaporins. Trends Pharmacol Sci 43, 30-42.

      Shigetomi, E., Bushong, E.A., Haustein, M.D., Tong, X., Jackson-Weaver, O., Kracun, S., Xu, J., Sofroniew, M.V., Ellisman, M.H., and Khakh, B.S. (2013). Imaging calcium microdomains within entire astrocyte territories and endfeet with GCaMPs expressed using adeno-associated viruses. J Gen Physiol 141, 633-647.

      Solenov, E., Watanabe, H., Manley, G.T., and Verkman, A.S. (2004). Sevenfold-reduced osmotic water permeability in primary astrocyte cultures from AQP-4-deficient mice, measured by a fluorescence quenching method. Am J Physiol Cell Physiol 286, C426-432.

      Verkman, A.S., Binder, D.K., Bloch, O., Auguste, K., and Papadopoulos, M.C. (2006). Three distinct roles of aquaporin-4 in brain function revealed by knockout mice. Biochim Biophys Acta 1758, 10851093.

    2. eLife Assessment

      In this work, the authors propose that astrocytic aquaporin 4 (AQP4) is the main pathway for tonic water efflux, without which astrocytes undergo cell swelling. These findings are important, because they shed light on key molecular mechanisms implicated with the regulation of brain water homeostasis. The authors use a broad set of experimental tools (e.g., acute brain slices, in vivo recording, and diffusion-weighted MRI) but the evidence remains incomplete without ruling out non-specific effects of TGN-020, and without evidence that changes in sulforhodamine B fluorescence can be used as reliable readouts of cell volume dynamics.

    3. Reviewer #1 (Public review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increased the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular endfeet which all have different AQP4 expression).

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for much of the other features of the CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling)

      Comments on revised version:

      The authors have addressed these suggestions as additional supplementary figures. Notably they find increased calcium signaling and stronger inhibition of calcium signaling by TGN-020 in astrocytic endfeet, where AQP4 is enriched.

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water selective. The authors here present important data showing that application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4], have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly AQP4 dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.<br /> (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.<br /> (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.<br /> (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature communications, 2020. 11(1).

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the Authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine B fluorescence as the proxy for cell volume dynamics. Using this approach, they have performed a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume "signal" in response to the AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key findings are that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume dynamics after spreading depolarizations. This study is perceived as potentially highly significant. However, several technical caveats could be considered better and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically sound study, in which the Authors employed a number of complementary ex vivo and in vivo techniques. The presented results are of interest to the field and potentially highly significant.

      (2) The innovative use of sulforhodamine B for in situ measurements of astrocyte cell volume dynamics is thoroughly validated in brain slices by quantifying changes in sulforhodamine fluorescence in response to hypoosmotic and hyperosmotic media.

      (3) The combination of cell volume measurements with registering functional outcomes in both astrocytes and neurons (cell-specific GCaMP6 signaling) is appropriate and adds to the significance of the work.

      (4) The use of ChR2 optogenetics for producing spreading depolarization allows to avoid many complications of chemical manipulations and much appreciated.

      Remaining limitations:

      (1) In the opinion of this reviewer, the effects of TGN-020 are not entirely consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically, genetic deletion of AQP4 reduces plasmalemmal water permeability in astrocytes by ~two-three-fold (when measured at 37oC, E. Solenov et al., AJP-Cell, 2004). This difference is significant but thought to have limited impact on steady-state water distribution. To the best of this reviewer's knowledge, cultured AQP4-null astrocytes do not show changes in degree of hypoosmotic swelling or hyperosmotic shrinkage. Thus, the findings of Solenov et al. are not (entirely) congruent with the conclusions of the current manuscript.

      Also, as noted by the Authors, the AQP4 knockout does not modify astrocytes swelling induced by hypoosmotic solution in brain slices (T.R. Murphy et al., Front Neurosci., 2017), further suggesting that AQP4 is not a significant rate-limiting factor for water movement across astrocyte membranes.

      The Authors do discuss the above-mentioned discrepancies and explain them by the context-dependent changes in water fluxes. Nevertheless, with these caveats in mind, it would be highly desirable to utilize an independent method measuring astrocytic volume and extracellular volume fraction.

      (2) As noted by this reviewer and now discussed by the Authors, changes in ADC signal (presented in in Fig. 5) may be confounded by the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes water fluxes across pia matter which is highly enriched in AQP4. If this is the case, the proposed brain water accumulation may be explained by factors other than astrocytic water homeostasis. This caveat certainly deserves further experimental exploration.

    1. eLife Assessment

      The authors utilized single-cell RNA-seq profiling of non-small cell lung cancer (NSCLC) patient tumor samples to generate useful insights into the determinants of immune checkpoint inhibitor (ICI) responsiveness in NSCLC patients. While some of the findings add weight to the current literature, the analysis is incomplete due to the small cohort size and heterogeneous population which has limited their ability to draw statistically supported conclusion after adjusting for multiple hypothesis testing, as well as the lack of functional characterization of the findings. This study would benefit from external cohorts to both validate the findings and justify the statistical analysis undertaken.

    2. Reviewer #1 (Public review):

      Summary:

      The authors study the variability of patient response of NSCLC patients on immune checkpoint inhibitors using single-cell RNA sequencing in a cohort of 26 patients and 33 samples (primary and metastatic sites), mainly focusing on 11 patients and 14 samples for association analyses, to understand the variability of patient response based on immune cell fractions and tumor cell expression patterns. The authors find immune cell fraction and clonal expansion differences, as well as tumor expression differences between responders and non-responders, partly validating previous hypotheses, and partly suggesting new markers for ICI response. Integrating immune and tumor sources of signal the authors claim to improve prediction of response markedly, albeit in a small cohort and using in-sample metrics.

      Strengths:

      - The problem of studying the tumor microenvironment, as well as the interplay between tumor and immune features is important and interesting and needed to explain heterogeneity of patient response and be able to predict it.<br /> - Extensive analysis of the scRNAseq data with respect to immune and tumor features on different axes of hypothesis relating to immune response and tumor immune evasion using state of the art methods.<br /> - The authors provide an interesting scRNAseq data set with well-curated cell types linked to outcomes data, which is valuable<br /> - High-quality immune cell type annotation including annotations based on additional ADT data<br /> - Integration of TCRseq to confirm subtype of T-cell annotation and clonality analysis<br /> - Interesting analysis of cell programs/states of the (predicted) tumor cells and characterization thereof

      Weaknesses:

      - Generally a very heterogeneous and small cohort where adjustments for confounding is hard. Additionally, there are many tests for association with outcome, where necessary multiple testing adjustments negate signal and confirmation bias likely, so biological take-aways have to be questioned.<br /> - The authors claim a very high "accuracy" performance, however given the small cohort and possible overfitting due to in-sample ROC the generalization of this to other cohorts is questionable.<br /> - Due to the small cohort with a lot of variability, more external validation is needed to be convincingly reproducible, especially when talking about AUC/accuracy of a predictor.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have utilised deep profiling methods to generate deeper insights into the features of the TME that drive responsiveness to PD-1 therapy in NSCLC.

      Strengths:

      The main strengths of this work lie in the methodology of integrating single cell sequencing, genetic data and TCRseq data to generate hypotheses regarding determinants of IO responsiveness.

      Some of the findings in this study are not surprising and well precedented eg. association of Treg, STAT3 and NFkB with ICI resistance and CD8+ activation in ICI responders and thus act as an additional dataset to add weight to this prior body of evidence. Whilst the role of Th17 in PD-1 resistance has been previously reported (eg. Cancer Immunol Immunother 2023 Apr;72(4):1047-1058, Cancer Immunol Immunother 2024 Feb 13;73(3):47, Nat Commun. 2021; 12: 2606 ) these studies have used non-clinical models or peripheral blood readouts. Here the authors have supplemented current knowledge by characterization of the TME of the tumor itself.

      Weaknesses:

      Unfortunately, the study is hampered by the small sample size and heterogeneous population and whilst the authors have attempted to bring in an additional dataset to demonstrate robustness of their approach, the small sample size has limited their ability to draw statistically supported conclusions. There is also limited validation of signatures/methods in independent cohorts and no functional characterisation of the findings. Because of these factors, this work (as it stands) does have value to the field but will likely have a relatively low overall impact.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.  

      Strengths:  

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.  

      Weaknesses:  

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design, and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or any other aspect optimized for any of the reactors used in the study, and if not, how were the values used in the study determined?  

      Thank you for your thoughtful comments. According to your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors  (Figure 6—figure supplement 1). We found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi: 10.5966/sctm.2015-0253)). We cited these previous studies in the Results and Materials and Methods section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Public Review):  

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.  

      Another potential advantage that perhaps wasn't well discussed in the manuscript is the reported suspension culture system does not require additional ECM to provide biophysical support for iPSC, which differentiates from previous studies using hydrogel and this should further simplify the hiPSC culture protocol.  

      Interestingly, although several hiPSC suspension media are currently available commercially, the content of these suspension media remained proprietary, as such the signaling that represses differentiation/maintains pluripotency in hiPSC suspension culture remained unclear. This study provided clear evidence that inhibition of the Wnt/PKC pathways is critical to repress spontaneous differentiation in hiPSC suspension culture.  

      I have several concerns that the authors should address, in particular, it is important to benchmark the reported suspension system with the current conventional culture system (eg adherent feeder-free culture), which will be important to evaluate the usefulness of the reported suspension system.  

      Thank you for this insightful suggestion. In this revised manuscript, we have performed additional experiments using conventional media, mTeSR1 (Stem Cell Technologies, Vancouver, Canada), comparing with the adherent feeder-free culture system in four different hiPSC lines simultaneously. Compared to the adherent conditions, the suspension conditions without chemical treatment decreased the expression of self-renewal marker genes/proteins and increased the expression levels of SOX17, T, and PAX6 (Figure 4 - figure supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in mTeSR1 medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions, reaching the comparable levels of the adherent culture conditions. These results indicated that these chemical treatments in suspension culture are beneficial even when using a conventional culture medium.

      Also, the manuscript lacks a clear description of a consistent robust effect in hiPSC maintenance across multiple cell lines.  

      Thank you for this insightful suggestion. We have performed additional experiments on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E). Overall, the treatment of LY333531 and IWR-1-endo in the StemFit AK02N medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. Also as above, we have added results using conventional media, mTeSR1, in comparison to the adherent feeder-free culture system in four different hiPSC lines simultaneously. These results show that this chemical treatment consistently produced robust effects in hiPSC maintenance across multiple cell lines using multiple conventional media.

      There are also several minor comments that should be addressed to improve readability, including some modifications to the wording to better reflect the results and conclusions.  

      In the revised manuscript, we have added and corrected the descriptions to improve readability, including some modifications to the wording to better reflect the results and conclusions. 

      Reviewer #3 (Public Review):  

      In the current manuscript, Matsuo-Takasaki et al. have demonstrated that the addition of PKCβ and WNT signaling pathway inhibitors to the suspension cultures of iPSCs suppresses spontaneous differentiation. These conditions are suitable for large-scale expansion of iPSCs. The authors have shown that they can perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs in these conditions. Moreover, the authors have performed a thorough characterization of iPSCs cultured in these conditions, including an assessment of undifferentiated stem cell markers and genetic stability. The authors have elegantly shown that iPSCs cultured in these conditions can be differentiated into derivatives of three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes they have shown that differentiation is comparable to adherent cultures.

      This new method of expanding iPSCs will benefit the clinical applications of iPSCs.  

      Recently, multiple protocols have been optimized for culturing human pluripotent stem cells in suspension conditions and their expansion. Additionally, a variety of commercially available media for suspension cultures are also accessible. However, the authors have not adequately justified why their conditions are superior to previously published protocols (indicated in Table 1) and commercially available media. They have not conducted direct comparisons.  

      Thank you for this careful suggestion. In this revised manuscript, we have added results using a conventional medium, mTeSR1 (Stem Cell Technologies), which has been used for the suspension culture in several studies. Compared to the adherent conditions using mTeSR1 medium, the suspension conditions with the same medium decreased the ratio of TRA1-60/SSEA4-positive cells and OCT4positive cells and the expression levels of OCT4 and NANOG and decreased the expression levels of SOX17, T, and PAX6 in 4 different hiPSC lines simultaneously (Figure 4 - Supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in the mTeSR1 medium reversed the decreased expression of these undifferentiated markers. With these direct comparisons, we were able to justify why our conditions are superior to previously published protocols using commercially available media.

      Additionally, the authors have not adequately addressed the observed variability among iPSC lines. While they claim in the Materials and Methods section to have tested multiple pluripotent stem cell lines, they do not clarify in the Results section which line they used for specific experiments and the rationale behind their choices. There is a lack of comparison among the different cell lines. It would also be beneficial to include testing with human embryonic stem cell lines.  

      Thank you for this insightful suggestion. In this revised manuscript, we have added results on 5 different hiPSC lines at the same time (Figure 3 C-E). Excuse for us, but it is hard to use human embryonic stem cell lines for this study due to ethical issues in Japanese governmental regulations. The treatment of LY333531 and IWR-1-endo increased the expression of self-renewal marker genes/proteins and decreased the expression levels of SOX17, T, and PAX6 in these hiPSC lines in general. These results indicated that these chemical treatments in suspension culture were robust in general while addressing the observed variability among iPSC lines.

      Additionally, there is a lack of information regarding the specific role of the two small molecules in these conditions.  

      In this revised manuscript, we have added data and discussion regarding the specific role of the two small molecules in these conditions in the Results and Discussion section. For using WNT signaling inhibitor, we hypothesized that adding Wnt signaling inhibitors may inhibit the spontaneous differentiation of hiPSCs into mesendoderm. Because exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of Wnt signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021). For using PKC inhibitors, "To identify molecules with inhibitory activity on neuroectodermal differentiation, hiPSCs were treated with candidate molecules in suspension conditions. We selected these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (GiacomanLozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017))." 

      We also found that the expression of naïve pluripotency markers, KLF2, KLF4, KLF5, and DPPA3, were up-regulated in the suspension conditions treated with LY333531 and IWR-1-endo while the expression of OCT4 and NANOG was at the same levels (Figure 5—figure supplement 2). Combined with RT-qPCR analysis data on 5 different hiPSC lines (Figure 3E), these results suggest that IWRLY conditions may drive hiPSCs in suspension conditions to shift toward naïve pluripotent states.

      The authors have not attempted to elucidate the underlying mechanism other than RNA expression analysis.  

      Regarding the underlying mechanisms, we have added results and discussion in the revised manuscript.  For Wnt activation in human pluripotent stem cells, several studies reported some WNT agonists were expressed in undifferentiated human pluripotent stem cells (Dziedzicka et al., 2021; Jiang et al, 2013; Konze et al, 2014). In suspension culture, cell aggregation causes tight cell-cell interaction. The paracrine effect of WNT agonists in the cell aggregation may strongly affect neighbor cells to induce spontaneous differentiation into mesendodermal cells. Thus, we think that the inhibition of WNT signaling is effective to suppress the spontaneous differentiation into mesendodermal lineages in suspension culture.

      For PKC beta activation in human pluripotent stem cells, we have shown that phosphorylated PKC beta protein expression is up-regulated in suspension culture than in adherent culture with western blotting (Figure 3 - figure supplement 1). The treatment of PKCβ inhibitor is effective to suppress spontaneous differentiation into neuroectodermal lineages. For future perspectives, it is interesting to examine (1) how and why PKCβ is activated (or phosphorylated), especially in suspension culture conditions, and (2) how and why PKCβ inhibition can suppress the neuroectodermal differentiation. Conversely, it is also interesting to examine how and why PKCβ activation is related to neuroectodermal differentiation.

      For these reasons some aspects of the manuscript need to be extended:  

      (1) It is crucial for authors to specify the culture media used for suspension cultures. In the Materials and Methods section, the authors mentioned that cells in suspension were cultured in either StemFit AK02N medium, 415 StemFit AK03N (Cat# AK03N, Ajinomoto, Co., Ltd., Tokyo, Japan), or StemScale PSC416 suspension medium (A4965001, Thermo Fisher Scientific, MA, USA). The authors should clarify in the text which medium was used for suspension cultures and whether they observed any differences among these media.  

      Sorry for this confusion. Basically in this study, we use StemFit AK02N medium (Figure 1-5, 7-9). For bioreactor experiments (Figure 6), we use StemFit AK03N medium, which is free of human and animalderived components and GMP grade. To confirm the effect of IWRLY chemical treatment, we use StemScale suspension medium (Figure 4 - figure supplement 1) and mTeSR1 medium (Figure 4 - figure supplement 2 and Figure 8 - figure supplement 1). In the revised manuscript we clarified which medium was used for suspension cultures in the Results and Materials and Methods section.

      Although we have not compared directly among these media in suspension culture (, which is primarily out of the focus of this study), we have observed some differences in maintaining self-renewal characteristics, preventing spontaneous differentiation (including tendencies to differentiate into specific lineages), stability or variation among different experimental times in suspension culture conditions. Overcoming these heterogeneity caused by different media, the IWRLY chemical treatment stably maintain hiPSC self-renewal in general. We have added this issue in the Discussion section.

      (2) In the Materials and Methods section, the authors mentioned that they used multiple cell lines for this study. However, it is not clear in the text which cell lines were used for various experiments. Since there is considerable variation among iPSC lines, I suggest that the authors simultaneously compare 2 to 3 pluripotent stem cell lines for expansion, differentiation, etc.  

      Thank you for this careful suggestion. We have added more results on the simultaneous comparison using StemFit AK02N medium in 5 different hiPSC lines (Figure 3 C-E) and using mTeSR1 medium in 4 different hiPSC lines (Figure 4 - figure supplement 2). From both results, we have shown that the treatment of LY333531 and IWR-1-endo was beneficial in maintaining the self-renewal of hiPSCs while suppressing spontaneous differentiation.

      (3) Single-cell sorting can be confusing. Can iPSCs grown in suspensions be single-cell sorted?

      Additionally, what was the cloning efficiency? The cloning efficiency should be compared with adherent cultures.  

      Sorry for this confusion. With our method, iPSCs grown in IWRLY suspension conditions can be singlecell sorted. We have improved the clarity of the schematics (Figure 7A). Also, we added the data on the cloning efficiency, which are compared with adherent cultures (Figure 7B). The cloning efficiency of adherent cultures was around 30%. While the cloning efficiency of suspension cultures without any chemical treatment was less than 10%, the IWR-1-endo treatment in the suspension cultures increased the efficiency was more than 20%. However, the treatment of LY333531 decreased the efficiency. These results indicated that the IWR-1-endo treatment is beneficial in single-cell cloning in suspension culture.

      (4) The authors have not addressed the naïve pluripotent state in their suspension cultures, even though PKC inhibition has been shown to drive cells toward this state. I suggest the authors measure the expression of a few naïve pluripotent state markers and compare them with adherent cultures  

      Thank you for this insightful comment. In the revised manuscript, we have added the data of RT-qPCR in 5 different hiPSC lines and specific gene expression from RNA-seq on naïve pluripotent state markers (Figure 3E and Figure 5 - figure supplement 2), respectively. Interestingly, the expression of KLF2, KLF4, KLF5, and DPPA3 is significantly up-regulated in IWRLY conditions. These results suggested that IWRLY suspension conditions drove hiPSCs toward naïve pluripotent state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Overall, I feel that this study is very interesting and comprehensive, but has significant weaknesses in the bioprocessing aspects. More optimization data is required for the suspension culture to truly show that the differentiation they are observing is not an artifact of a non-optimized protocol.  

      Thank you for your thoughtful comments. Following your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors (Figure 6—figure supplement 1). From these optimization experiments, we found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with acceptable stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi:10.5966/sctm.2015-0253). We cited these previous studies in the Results section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Recommendations For The Authors):  

      The following comments should be addressed by the authors to improve the manuscript:  

      (1) Abstract: '...a scalable culture system that can precisely control the cell status for hiPSCs is not developed yet.' There were previous reports for a scalable iPSC culture system so I would suggest toning down/rephrasing this point: eg that improvement in a scalable iPSC culture system is needed.  

      Thank you for this careful suggestion. Following this suggestion, We have changed the sentence as "the improvement in a scalable culture system that can precisely control the cell status for hiPSCs is needed."

      (2) Line 71: please specify what media was used as a 'conventional medium' for suspension culture, was it Stemscale?  

      As suggested, we specified the media as StemFit AK02N used for this experiment. 

      (3) Fig 1E: It's not easy to see gating in the FACS plots as the threshold line is very faint, please fix this issue.  

      As suggested, we used thicker lines for the gating in the FACS plots (Figure 1E).

      (4) Fig 1G-J, Fig 2D-H: The RNAseq figures appeared pixelated and the resolution of these figures should be improved. The x-axis label for Fig 1H is missing.  

      We have improved these figures in their resolution and clarity. Also, we have added the x-axis label as "enrichment distribution" for gene set enrichment analysis (GSEA) in Figures 1H, 5F, and 5- figure supplement 1B.

      (5) Line 103-107: 'Since Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages, and is endogenously involved in the regulation of mesendoderm differentiation of pluripotent stem cells.....'. The two points seem the same and should be clarified.  

      Sorry for this unclear description. We have changed this description as "Exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of WNT signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021; Jiang et al, 2013)." With this description, we hope that you will understand the difference of two points.

      (6) Line 113: 'In samples treated with inhibitors' should be 'In samples treated with Wnt inhibitors'.  

      Thank you for this careful suggestion. We have corrected this. 

      (7) Line 115: '....there was no reduction in PAX6 expression.' That's not entirely correct, there was a reduction in PAX6 in IWR-1 endo treatment compared to control suspension culture (is this significant?), but not consistently for IWP-2 treatment. Please rephrase to more accurately describe the results.  

      Sorry for this inaccurate description. We have corrected this phrase as "there was only a small reduction in PAX6 expression in the IWR-1-endo-treated condition and no reduction in the IWP2-treated condition" as recommended.

      (8) It's critical to show that the effect of the suspension culture system developed here can maintain an undifferentiated state for multiple hiPSC lines. I think the author did test this in multiple cell lines, but the results are scattered and not easy to extract. I would recommend adding info for the hiPSC line used for the results in the legend, eg WTC11 line was used for Figure 3, 201B7 line was used for Figure 2. I would suggest compiling a figure that confirms the developed suspension system (IWR-1 +LY) can support the maintenance of multiple hiPSC lines.  

      Thank you for this insightful suggestion. We have added data on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E) and on hiPSC maintenance across 4 hiPSC lines in suspension culture using mTeSR1 medium simultaneously  (Figure 4 - figure supplement 2). Together, the treatment of LY333531 and IWR-1-endo in these media reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. These results show that these chemical treatment produced a consistent robust effect in hiPSC maintenance across multiple cell lines.

      (9) Line 166: Please use the correct gene nomenclature format for a human gene (italicised uppercase) throughout the manuscript. Also, list the full gene name rather than PAX2,3,5.  

      Sorry for the incorrectness of the gene names. We have corrected them.

      (10) Please improve the resolution for Figure 4D.  

      We have provided clearer images of Figure 4D.

      (11) In the first part of the study, the control condition was referred to as 'suspension culture' with spontaneous differentiation, but in the later parts sometimes the term 'suspension culture' was used to describe the IWR1+LY condition (ie lines 271-272). I would suggest the authors carefully go through the manuscript to avoid misinterpretation on this issue.  

      Thank you for this careful suggestion. To avoid this misinterpretation on this issue, we use 'suspension culture' for just the conventional culture medium and 'LYIWR suspension culture' for the culture medium supplemented with LY333531 and IWR1-endo in this manuscript.

      (12) Figure 5: It is impressive to demonstrate that the IWR1+LY suspension culture enables large-scale expansion of a clinical-grade hiPSC line using a bioreactor, yielding 300 vials/passage. Can the author add some information regarding cell yield using a conventional adherent culture system in this cell line? This will provide a comparison of the performance of the IWR1+LY suspension culture system to the conventional method.  

      Thank you for this valuable suggestion. We have provided information regarding cell yield using a conventional adherent culture system in this cell line in the Results as "Since the population doubling time (PDT) of this hiPSC line in adherent culture conditions is 21.8 - 32.9 hours at its production (https://www.cira-foundation.or.jp/e/assets/file/provision-of-ips-cells/QHJI14s04_en.pdf), this proliferation rate in this large scale suspension culture is comparable to adherent culture conditions."

      (13) Line 273: For testing the feasibility of using IWR1+LY media to support the freeze and thaw process, the author described the cell number and TRA160+/OCT4+ cell %. How is this compared to conventional media (eg E8)? It would be nice to see a head-to-head comparison with conventional media, quantification of cell count or survival would be helpful to determine this.  

      For this issue, we attempted a direct freeze and thaw process using conventional media, StemFit AK02N in 201B7 line (Figure 8) or mTeSR1 in 4 different hiPSC lines(Figure 8 - figure supplement 1) with or without IWR1+LY. However, since the hiPSCs cultured in suspension culture conditions without IWR1+LY quickly lost their self-renewal ability, these frozen cells could not be recovered in these conditions nor counted. Our results indicate that the addition to IWR1+LY in the thawing process support the successful recovery in suspension conditions.

      (14) More details of the passaging method should be added in the method section. Do you do cell count following accutase dissociation and replate a defined density (eg 1x10^5/ml)?  

      Yes. We counted the cells in every passage in suspension culture conditions. We have added more explanation in the Materials and Methods as below.

      "The dissociated cells were counted with an automatic cell counter (Model R1, Olympus) with Trypan Blue staining to detect live/dead cells. The cell-containing medium was spun down at 200 rpm for 3 minutes, and the supernatant was aspirated. The cell pellet was re-suspended with a new culture medium at an appropriate cell concentration and used for the next suspension culture."

      (15) The IWR1+LY suspension culture system requires passage every 3-5 days. Is there still spontaneous differentiation if the hiPSC aggregate grows too big?  

      Thank you for this insightful question.

      Yes. The size of hiPSC aggregates is critical in maintaining self-renewal in our method as previous studies showed. Stirring speed is a key to make the proper size of hiPSC aggregates in suspension culture. Also, the culture period between passages is another key not to exceed the proper size of hiPSC aggregates. Thus, we keep stirring speed at 90 rpm (135 rpm for bioreactor conditions) basically and passaging every 3 - 5 days in suspension culture conditions.

      (16) Several previous studies have described the development of hiPSC suspension culture system using hydrogel encapsulation to provide biophysical modulation (reviewed in PMID: 32117992). In comparison, it seems that the IWR1+LY suspension system described here does not require ECM addition which further simplifies the culture system for iPSC. It would be good to add more discussion on this topic in the manuscript, such as the potential role of the E-cadherin in mediating this effect - as RNAseq results indicated that CDH1 was upregulated in the IWR1+LY condition).  

      Thank you for this valuable suggestion. We have added more discussion on this topic in the Discussion section as below.

      "Thus, our findings show that suspension culture conditions with Wnt and PKCβ inhibitors (IWRLY suspension conditions) can precisely control cell conditions and are comparable to conventional adhesion cultures regarding cellular function and proliferation. Many previous 3D culture methods intended for mass expansion used hydrogel-based encapsulation or microcarrier-based methods to provide scaffolds and biophysical modulation (Chan et al, 2020). These methods are useful in that they enable mass culture while maintaining scaffold dependence. However, the need for special materials and equipment and the labor and cost involved are concerns toward industrial mass culture. On the other hand, our IWRLY suspension conditions do not require special materials such as hydrogels, microcarriers, or dialysis bags, and have the advantage that common bioreactors can be used. "

      "On the other hand, it is interesting to see whether and how the properties of hiPSCs cultured in IWRLY suspension culture conditions are altered from the adherent conditions. Our transcriptome results in comparison to adherent conditions show that gene expression associated with cell-to-cell attachment, including E-cadherin (CDH1), is more activated. This may be due to the status that these hiPSCs are more dependent on cell-to-cell adhesion where there is no exogenous cell-to-substrate attachment in the three-dimensional culture. Previous studies have shown that cell-to-cell adhesion by E-cadherin positively regulates the survival, proliferation, and self-renewal of human pluripotent stem cells (Aban et al, 2021; Li et al, 2012; Ohgushi et al, 2010). Furthermore, studies have shown that human pluripotent stem cells can be cultured using an artificial substrate consisting of recombinant E-cadherin protein alone without any ECM proteins (Nagaoka et al, 2010). Also, cell-to-cell adhesion through gap junctions regulates the survival and proliferation of human pluripotent stem cells (Wong et al, 2006; Wong et al, 2004). These findings raise the possibility that the cell-to-cell adhesion, such as E-cadherin and gap junctions, are compensatory activated and support hiPSC self-renewal in situations where there are no exogenous ECM components and its downstream integrin and focal adhesion signals are not forcedly activated in suspension culture conditions. It will be interesting to elucidate these molecular mechanisms related to E-cadherin in the hiPSC survival and self-renewal in IWRLY suspension conditions in the future."

      Reviewer #3 (Recommendations For The Authors):  

      (1) I am a bit confused about the passage of adherent cultures. The authors claim that they used EDTA for passaging and plated cells at a density of 2500 cells/cm2. My understanding is that EDTA is typically used for clump passaging rather than single-cell passaging.  

      Sorry about this confusion. We routinely use an automatic cell counter (model R1, Olympus) which can even count small clumpy cells accurately. Thus, we show the cell numbers in the passaging of adherent hiPSCs.  

      (2) Figure 2D- The authors have not directly compared IWR-1-endo with IWR-1-endo+Go6983 for the expression of T and SOX17, a simultaneous comparison would be an interesting data.  

      As recommended, we have added the data that directly compared IWR-1-endo with IWR-1endo+Go6983 for the expression of T and SOX17 in Figure 2D. The addition of IWR-1-endo alone decreased the expression of T and SOX17, but not PAX6, which were similar to the data in Figure 2C.

      (3) Oxygen levels play a crucial role in pluripotency maintenance. Could the authors please specify the oxygen levels used for culturing cells in suspension?  

      Sorry for not mentioning about oxygen levels in this study. We basically use normal oxygen levels (i.e., 21% O2) in suspension culture conditions. We have explained this in the Materials and Methods section.

      (4) Figure supplement 1 (G and H): In the images, it is difficult to determine whether the green (PAX6 and SOX17) overlaps with tdT tomato. For better visualization, I suggest that the authors provide separate images for the green and red colors, as well as an overlay.  

      Sorry for these unclear images. We have provided separate images for the green and red colors, as well as an overlay in Figure 1- figure supplement 1 G and H.

      (5) The authors have only compared quantitatively the expression of TRA-1-60 for most of the figures. I suggest that the authors quantitatively measure the expression of other markers of undifferentiated stem cells, such as NANOG, OCT4, SSEA4, TRA-1-81, etc.  

      We have added the quantitative data of the expression of markers of undifferentiated hiPSCs including NANOG, OCT4, SSEA4, and TRA-1-60 on 5 different hiPSC lines in Figure 3 C-E.

      (6) In Figure 2D, the authors have tested various small molecules but the rationale behind testing those molecules is missing in the text.  

      These molecules are chosen as putatively affecting neuroectodermal induction from the pluripotent state.

      We have added the rationale with appropriate references in the Results section as below.

      "We have chosen these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (Giacoman-Lozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017)) (Figure 2A; listed in Supplementary Table 1). "

      (7) In the beginning authors used Go6983 but later they switched to LY333531, the reasoning behind the switch is not explained well.  

      To explain the reasons for switching to LY333531 from Go6983 clearly, we reorganized the order of results and figures. In short, we found that the suppression of PAX6 expression in hiPSCs cultured in suspension conditions was observed with many PKC inhibitors, all of which possessed PKCβ inhibition activity (Figure 2—figure supplement 2B-D). Also, elevated expression of PKCβ in suspension-cultured hiPSCs could affect the spontaneous differentiation (Figure 3—figure supplement 1A-C). To further explore the possibility that the inhibition of PKCβ is critical for the maintenance of self-renewal of hiPSCs in the suspension culture, we evaluated the effect of LY333531, a PKCβ specific inhibitor. The maintenance of suspension-cultured hiPSCs is specifically facilitated by the combination of PKCβ and Wnt signaling inhibition (Figure 3A and B; Figure 2—figure supplement 1). Last, we performed longterm culture for 10 passages in suspension conditions and compared hiPSC growth in the presence of LY333531 or Go6983. LY333531 was superior in the proliferation rate and maintaining OCT4 protein expression in the long-term culture (Figure 4). Thus, we used IWR-1-endo and LY333531 for the rest of this study.

      (8) I suggest the authors measure cell death after the treatment with LY+IWR-1-endo.  

      Thank you for this valuable suggestion. We have measured cell death after the treatment with LY+IWR1-endo and found that the chemical combination had no or little effects on the cell death. We have added data in Figure 3—figure supplement 2 and the description in the Results section as below. "We also examined whether the combination of PKCb and Wnt signaling inhibition affects the cell survival in suspension conditions. In this experiment, we used another PKC inhibitor, Staurosporine (Omura et al, 1977), which has a strong cytotoxic effect as a positive control of cell death in suspension conditions. The addition of IWR-1-endo and LY333531 for 10 days had no effects on the apoptosis while the addition of Staurosporine for 2 hours induced Annexin-V-positive apoptotic cells  (Figure 3—figure supplement 2). These results indicate that the combination of PKCb and Wnt signaling inhibition has no or little effects on the cell survival in suspension conditions."

      (9) The authors have performed reprogramming using episomal vectors and using Sendai viruses. In both the protocols authors have added small molecules at different time points, for episomal vector protocol at day 3 and Sendai virus protocol at day 23. Why is this different?  

      Thank you for this insightful question. We intended that these differences should be reflected in the degree of the expression from these reprogramming vectors. The expression of reprogramming factors from these vectors should suppress the spontaneous differentiation in reprogramming cells. Sendai viral vectors should last longer than episomal plasmid vectors. Thus, we thought that adding these chemical inhibitors for episomal plasmid vector conditions from the early phase of reprogramming and for Sendai viral vector conditions from the late phase of reprogramming. For future perspectives, we might further need to optimize the timing of adding these molecules.

      (10) The protocol for three germ layer differentiation using a specific differentiation medium requires further elaboration. For instance, the authors mentioned that suspension cultures were transferred to differentiation media but did not emphasize the cell number and culture conditions before moving the cultures to the differentiation media.  

      Sorry for this unclear description. We have added the explanation on the cell number and culture conditions before moving the cultures to the differentiation media in the Materials and Methods section as below.

      "As in the maintenance conditions, 4 × 105 hiPSC were seeded in one well of a low-attachment 6-well plate with 4 mL of StemFit AK02N medium supplemented with 10 µM Y-27632. This plate was placed onto the plate shaker in the CO2 incubator. Next day, the medium was changed to the germ layer specific differentiation medium."

    2. eLife Assessment

      This comprehensive and compelling study presents a robust, cost-effective method for expanding pluripotent stem cells. The authors have identified a media condition that maintains iPSCs in suspension cultures by inhibiting the PKCβ and Wnt signaling pathways. The manuscript is important for the pluripotent stem cell field as it seeks robust and economical approaches to expand iPSCs at scale for high throughput screens and preclinical studies. While the authors have tested their media and protocol on a few lines, given the variability of iPSCs, further testing across more cell lines and in different laboratory settings will be crucial to evaluate its reproducibility.

    3. Reviewer #1 (Public review):

      Summary:

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.

      Strengths:

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.

      Weaknesses:

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or anything other aspect optimized for any of the reactors used in the study and if not, how were the values used in the study determined?

      Post-revision:

      The authors did a commendable job in responding and addressing my comments and concerns in addition to those of the other reviewers. I think this study will be of interest to the field and will add to our collective knowledge on how PSCs react to being cultured in suspension conditions.

    4. Reviewer #2 (Public review):

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.

      Review for second submission:

      In this revised manuscript, the authors provided new data to further support that suspension culture with Wnt/PKC inhibitors can be used for long-term hiPSC maintenance across multiple cell lines, as well as comparison with current benchmark culture system. New discussion sections were also added to put the findings into perspective of current development and the need for hiPSC maintenance culture system, and the figures were updated to improve readability. Overall, the authors have addressed all my concerns in this revised manuscript. Congratulations to the authors on this very interesting study.

    5. Reviewer #3 (Public review):

      In the current manuscript, Matsuo-Takasaki et al. demonstrate that the addition of PKCβ and WNT signaling pathway inhibitors to suspension cultures of iPSCs effectively suppresses spontaneous differentiation. These conditions are well-suited for the large-scale expansion of iPSCs. The authors have shown that, under these conditions, they can successfully perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs. Furthermore, they provide a comprehensive characterization of iPSCs grown in these conditions, including assessments of undifferentiated stem cell markers and genetic stability.

      They have elegantly demonstrated that iPSCs cultured in these conditions can differentiate into derivatives of all three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes, the authors show that differentiation is comparable to that of adherent cultures. This new method of expanding iPSCs has significant potential for clinical applications. The authors also tested these conditions in multiple cell lines and observed consistent results.

      Although the authors have elaborated on the mechanism to some extent-suggesting that PKCβ and WNT signaling pathway inhibition suppresses differentiation and shifts cells toward a naïve pluripotency state in suspension cultures-further research is needed to fully understand this process. Nevertheless, their findings are promising and will be beneficial for producing scalable amounts of iPSCs in controlled conditions.

    1. eLife Assessment

      This valuable study reports machine learning models derived from large-scale data to predict the risk of post-stroke epilepsy. The evidence supporting the conclusions is convincing, although there are some validation issues (lack of cross-validation, possible bias in external validation results). The study may be of interest in the field of clinical neurology

    2. Reviewer #1 (Public review):

      Summary:

      This is a large cohort of ischemic stroke patients from a single centre. The author successfully set up predictive models for PTS.

      Strengths:

      The design and implementation of the trial are acceptable, with the credibility of the results. It may provide evidence of seizure prevention in the field of stroke treatment.

      Weaknesses:

      My concerns are well responded to.

    3. Reviewer #2 (Public review):

      Summary

      The authors present multiple machine-learning methodologies to predict post-stroke epilepsy (PSE) from admission clinical data.

      Strengths

      The Statistical Approach section is very well written. The approaches used in this section are very sensible for the data in question.

      Typos have now been addressed and improved interpretability has been added to the paper, which is appreciated.

      Weaknesses

      The authors have clarified that the first features available for each patient have been used. However, they have not shown that these features did not occur before the time of post-stroke epilepsy. Explicit clarification of this should be performed.

      The likely impact of the work on the field

      If this model works as claimed, it will be useful for predicting PSE. This has some direct clinical utility.

      Analysis of features contributing to PSE may provide clinical researchers with ideas for further research on the underlying aetiology of PSE.

    4. Reviewer #3 (Public review):

      Summary:

      The authors report the performance of a series of machine learning models inferred from a large-scale dataset and externally validated with an independent cohort of patients, to predict the risk of post-stroke epilepsy. Some of the reported models have very good explicative performance, and seem to have very good predictive ability.

      Strengths:

      The models have been derived from real-world large-scale data.

      Performances of the best-performing models seem to be very good according to the external validation results.

      Early prediction of risk of post-stroke epilepsy would be of high interest to implement early therapeutic interventions that could improve prognosis.

      Code is publicly available. The authors also stated that the datasets used are available on request.

      Weaknesses:

      The writing of the article may be significantly improved.

      Although the external validation is appreciated, cross-validation to check robustness of the models would also be welcome.

      External validation results may be biased/overoptimistic, since the authors informed that "The external validation cohort focused more on collecting positive cases 80 to examine the model's ability to identify positive samples", which may result in overoptimistic PPV and Sensitivity estimations. The specificity for the external validation set has not been disclosed.

    1. Author response:

      Joint Public Reviews:

      Here, the authors compare how different operationalizations of adverse childhood experience exposure related to patterns of skin conductance response during a fear conditioning task. They use a large dataset to definitively understand a phenomenon that, to date, has been addressed using a range of different definitions and methods, typically with insufficient statistical power. Specifically, the authors compared the following operationalizations: dichotomization of the sample into "exposed" and "non-exposed" categories, cumulative adversity exposure, specificity of adversity exposure, and dimensional (threat versus deprivation) adversity exposure. The paper is thoughtfully framed and provides clear descriptions and rationale for procedures, as well as package version information and code. The authors' overall aim of translating theoretical models of adversity into statistical models, and comparing the explanatory power of each model, respectively, is an important and helpful addition to the literature. However, the analysis would be strengthened by employing more sophisticated modelling techniques that account for between-subjects covariates and the presentation of the data needs to be streamlined to make it clearer for the broad audience for which it is intended.

      Strengths

      Several outstanding strengths of this paper are the large sample size and its primary aim of statistically comparing leading theoretical models of adversity exposure in the context of skin conductance response. This paper also helpfully reports Cohen's d effect sizes, which aid in interpreting the magnitude of the findings. The methods and results are generally thorough.

      Weaknesses

      Weakness 1: The largest concern is that the paper primarily relies on ANOVAs and pairwise testing for its analyses and does not include between-subjects covariates. Employing mixedeffects models instead of ANOVAs would allow more sophisticated control over sources of random variance in the sample (especially important for samples from multi-site studies such as the present study), and further allow the inclusion of potentially relevant between-subjects covariates such as age (e.g. Eisenstein et al., 1990) and gender identity or sex assigned at birth (e.g. Kopacz II & Smith, 1971) (perhaps especially relevant due to possible to gender or sex-related differences in ACE exposure; e.g. Kendler et al., 2001). Also, proxies for socioeconomic status (e.g. income, education) can be linked with ACE exposure (e.g. Maholmes & King, 2012) and warrant consideration as covariates, especially if they differ across adversity-exposed and unexposed groups. 

      We appreciate the reviewer's suggestion and recognize the value of using (more) sophisticated statistical methods. However, we think that considerations which methods to employ should not only be guided by perceived complexity and think that the chosen ANOVA -based approach provides reliable and valid data. In our revision, we address the reviewer's suggestion by demonstrating that employing mixed models leaves the reported results unchanged (a). We would also like to refer the reviewer to the robustness analyses provided in the initial supplementary material (b).

      a) Re-running analyses using mixed models

      Based on the reviewers' suggestion, we repeated our main analyses (association between exposure to childhood adversity and SCRs, arousal, valence, and contingency ratings during fear acquisition and generalization) using linear mixed models, including age, sex, educational attainment, and childhood adversity as fixed effects, and site as a random effect. These analyses produced results similar to those in our manuscript, demonstrating a significant effect of childhood adversity on SCRs, as assessed by CS discrimination during both acquisition training and the generalization phase, and on general reactivity, but not on linear deviation scores (LDS). For the different rating types, we did not observe any significant effects of childhood adversity.

      We would prefer to retain our main analyses as they are and report the linear mixed model results as additional results in the supplement. However, if the reviewer and editor have strong preferences otherwise, we are open to presenting the mixed models in the main manuscript and moving our previous analyses to the supplement.

      We added the following paragraph to the main manuscript (page 25-26):

      “At the request of a reviewer, we repeated our main analyses by using linear mixed models including age, sex, school degree (i.e., to approximate socioeconomic status), and exposure to childhood adversity as mixed effects as well as site as random effect. These analyses yielded comparable results demonstrating a significant effect of childhood adversity on CS discrimination during acquisition training and the generalization phase as well as on general reactivity, but not on the generalization gradients in SCRs (see Supplementary Table 2 A). Consistent with the results of the main analyses reported in our manuscript, we did not observe any significant effects of childhood adversity on the different types of ratings when using mixed models (see Supplementary Table 2 B-D). Some of the mixed model analyses showed significantly lower CS discrimination during acquisition training and generalization, and lower general reactivity in males compared to females (see Supplementary Table 2 for details).”

      b) Additional robustness tests for the main analyses (already provided in the initial submission as supplementary material)

      We would also like to refer the reviewer to the robustness analyses in the initial supplement to account for possible site effects. Adding site to the analyses affected the pvalue in only one instance: entering site as covariate in analyses of CS discrimination during acquisition training attenuated the p-value of the ACQ exposure effect from p = 0.020 to p = 0.089.

      Further robustness checks involved repeating our main analyses while excluding (a) physiological non-responders (participants with only SCRs = 0) and (b) extreme outliers (data points ± 3 SDs from the mean) to ensure generalizable results. These repetitions of the analyses did not lead to any changes in the results.

      We did not include age in our primary analyses due to the homogeneity of our sample and the lack of related hypotheses. Additionally, socio-economic status was assessed only crudely via the highest education level attained, rendering it of limited use.

      Weakness 2: On a related methodological note, the authors mention that scores representing threat and deprivation were not problematically collinear due to VIFs being <10; however, some sources indicate that VIFs should be <5 (e.g. Akinwande et al., 2015).

      We thank the reviewer for bringing different cut-offs to our attention. We have revised this section to highlight the arbitrary nature of their interpretation (page 33):

      “Within the dimensional model framework, the issue of multicollinearity among predictors (i.e., different childhood adversity types) is frequently discussed (McLaughlin et al., 2021; Smith & Pollak, 2021). If we apply the rule of thumb of a variance inflation factor (VIF) > 10, which is often used in the literature to indicate concerning multicollinearity (e.g., Hair, Anderson, Tatham, & Black, 1995; Mason, Gunst, & Hess, 1989; Neter, Wasserman, & Kutner, 1989), we can assume that that multicollinearity was not a concern in our study (abuse: VIF = 8.64; neglect: VIF = 7.93). However, some authors state that VIFs should not exceed a value of 5 (e.g., Akinwande, Dikko, and Samson (2015)), while others suggest that these rules of thumb are rather arbitrary (O’brien, 2007).”

      Weakness 3: Additionally, the paper reports that higher trait anxiety and depression symptoms were observed in individuals exposed to ACEs, but it would be helpful to report whether patterns of SCR were in turn associated with these symptom measures and whether the different operationalizations of ACE exposure displayed differential associations with symptoms.

      We thank the reviewer for highlighting these relevant points. We have included additional analyses in the supplementary material in response to this comment. Figures and the corresponding text are also copied below for your convenience.

      We added the following paragraphs to the main manuscript: Methods (page 21):

      “Analyses of trait anxiety and depression symptoms

      To further characterize our sample, we compared individuals being unexposed compared to exposed to childhood adversity on trait anxiety and depression scores by using Welch tests due to unequal variances.

      On the request of a reviewer, we additionally investigated the association of childhood adversity as operationalized by the different models used in our explanatory analyses (i.e., cumulative risk, specificity, and dimensional model) and trait anxiety as well as depression scores (see Supplementary Figure 7). By using STAI-T and ADS-K scores as independent variable, we calculated a) a comparison of conditioned responding of the four severity groups (i.e., no, low, moderate, severe exposure to childhood adversity) using one-way ANVOAs and the association with the number of sub-scales exceeding an at least moderate cut-off in simple linear regression models for the implementation of the cumulative risk model, and b) the association with the CTQ abuse and neglect composite scores in separate linear regression models for the implementation of the specificity/dimensional models. On request of the reviewer, we also calculated the Pearson correlation between trait anxiety (i.e., STAI-T scores), depression scores (i.e., ADS-K scores) and conditioned responding in SCRs (see Supplementary Table 8).”

      Results (page 38):

      “Analyses of trait anxiety and depression symptoms

      As expected, participants exposed to childhood adversity reported significantly higher trait anxiety and depression levels than unexposed participants (all p’s < 0.001; see Table 1 and Supplementary Figure 6). This pattern remained unchanged when childhood adversity was operationalized differently - following the cumulative risk approach, the specificity, and dimensional model (see methods). These additional analyses all indicated a significant positive relationship between exposure to childhood adversity and trait anxiety as well as depression scores irrespective of the specific operationalization of “exposure” (see Supplementary Figure 7).

      CS discrimination during acquisition training and the generalization phase, generalization gradients, and general reactivity in SCRs were unrelated to trait anxiety and depression scores in this sample with the exception of a significant association between depression scores and CS discrimination during fear acquisition training (see Supplementary Table 8). More precisely, a very small but significant negative correlation was observed indicating that high levels of depression were associated with reduced levels of CS discrimination (r = -0.057, p =0.033). The correlation between trait anxiety levels and CS discrimination during fear acquisition training was not statistically significant but on a descriptive level, high anxiety scores were also linked to lower CS discrimination scores (r = -0.05, p = 0.06) although we highlight that this should not be overinterpreted in light of the large sample. However, both correlations (i.e., CS-discrimination during fear acquisition training and trait anxiety as well as depression, respectively) did not statistically differ from each other (z = 0.303, p = 0.762, Dunn & Clark, 1969). Interestingly, and consistent with our results showing that the relationship between childhood adversity and CS discrimination was mainly driven by significantly lower CS+ responses in exposed individuals, trait anxiety and depression scores were significantly associated with SCRs to the CS+, but not to the CS- during acquisition training (see Supplementary Table 8).”

      Weakness 4: Given the paper's framing of SCR as a potential mechanistic link between adversity and mental health problems, reporting these associations would be a helpful addition. These results could also have implications for the resilience interpretation in the discussion (lines 481-485), which is a particularly important and interesting interpretation.

      We have added a paragraph on this to the discussion (page 41):

      “Interestingly, in our study, trait anxiety and depression scores were mostly unrelated to SCRs, defined by CS discrimination and generalization gradients based on SCRs as well as general SCR reactivity, with the exception of a significant - albeit minute - relationship between CS discrimination during acquisition training and depression scores (see above). Although reported associations in the literature are heterogeneous (Lonsdorf et al., 2017), we may speculate that they may be mediated by childhood adversity. We conducted additional mediation analyses (data not shown) which, however, did not support this hypothesis. As the potential links between reduced CS discrimination in individuals exposed to childhood adversity and the developmental trajectories of psychopathological symptoms are still not fully understood, future work should investigate these further in - ideally - prospective studies.”

      Weakness 5: Given that the manuscript criticizes the different operationalizations of childhood adversity, there should be greater justification of the rationale for choosing the model for the main analyses. Why not the 'cumulative risk' or 'specificity' model? Related to this, there should also be a stronger justification for selecting the 'moderate' approach for the main analysis. Why choose to cut off at moderate? Why not severe, or low? Related to this, why did they choose to cut off at all? Surely one could address this with the continuous variable, as they criticize cut-offs in Table 2.

      We thank the reviewers and editors for bringing to our attention that our reasoning for choosing the main model was not clear. As outlined in the manuscript, we chose the approach for the main analyses from the literature as a recent review on this topic (Ruge et al., 2023) has shown the moderate CTQ cut-off to be the most abundantly employed in the field of research on associations between childhood adversity and threat learning. We have made this rationale more explicit in our revised manuscript (page 15/21):

      “Operationalization of "exposure"

      We implemented different approaches to operationalize exposure to childhood adversity in the main analyses and exploratory analyses (see Table 2). In the main analyses, we followed the approach most commonly employed in the field of research on childhood adversity and threat learning - using the moderate exposure cut-off of the CTQ (for a recent review see Ruge et al. (2024)). In addition, the heterogeneous operationalizations of classifying individuals into exposed and unexposed to childhood adversity in the literature (Koppold, Kastrinogiannis, Kuhn, & Lonsdorf, 2023; Ruge et al., 2024) hampers comparison across studies and hence cumulative knowledge generation. Therefore, we also provide exploratory analyses (see below) in which we employ different operationalizations of childhood adversity exposure.”

      “Exploratory analyses

      Additionally, the different ways of classifying individuals as exposed or unexposed to childhood adversity in the literature (Koppold et al., 2023; for discussion see Ruge et al., 2024) hinder comparison across studies and hence cumulative knowledge generation. Therefore, we also conducted exploratory analyses using different approaches to operationalize exposure to childhood adversity (see Table 2 for details).”

      Furthermore, as correctly noted, we fully agree that employing the moderate cut-off (or any cut-off in fact) is in principle an arbitrary decision - despite being guided by and derived from the literature in the field. However, we would like to draw the reviewers’ attention to Figure 5 in the initial submission (please see also below): Although the differences in SCR between severity groups were not significant, the overall pattern suggests at a descriptive level that the decline in CS discrimination, LDS and general reactivity in SCR occurs mainly when childhood adversity exceeds a moderate level. Thus, while we used the moderate cut-off as it was recently shown to be the most widely used approach in the literature (see Ruge et al., 2023), our exploratory analyses also seem to suggest on a descriptive level, that this cut-off may indeed “make sense”. We also refer to this in the results section (page 31-32) and discussion (page 43-44):

      Results:

      “However, on a descriptive level (see Figure 5), it seems that indeed exposure to at least a moderate cut-off level may induce behavioral and physiological changes (see main analysis, Bernstein & Fink, 1998). This might suggest that the cut-off for exposure commonly applied in the literature (see Ruge et al., 2024) may indeed represent a reasonable approach.”

      Discussion:

      “It is noteworthy, however, that this cut-off appears to map rather well onto psychophysiological response patterns observed here (see Figure 5). More precisely, our exploratory results of applying different exposure cut-offs (low, moderate, severe, no exposure) seem to indicate that indeed a moderate exposure level is “required” for the manifestation of physiological differences, suggesting that childhood adversity exposure may not have a linear or cumulative effect.”

      Weakness 6: In the Introduction, the authors predict less discrimination between signals of danger (CS+) and safety (CS-) in trauma-exposed individuals driven by reduced responses to the CS+. Given the potential impact of their findings for a larger audience, it is important to give greater theoretical context as to why CS discrimination is relevant here, and especially what a reduction in response specifically to danger cues would mean (e.g. in comparison to anxiety, where safety learning is impacted).

      We thank the reviewer for highlighting that this was not sufficiently clear. We revised the paragraph in the introduction as follows (page 7-8):

      “Fear acquisition as well as extinction are considered as experimental models of the development and exposure-based treatment of anxiety- and stress-related disorders. Fear generalization is in principle adaptive in ensuring survival (“better safe than sorry”), but broad overgeneralization can become burdensome for patients. Accordingly, maintaining the ability to distinguish between signals of danger (i.e., CS+) and safety (i.e., CS-) under aversive circumstances is crucial, as it is assumed to be beneficial for healthy functioning (Hölzel et al., 2016) and predicts resilience to life stress (Craske et al., 2012), while reduced discrimination between the CS+ and CS- has been linked to pathological anxiety (Duits et al., 2015; Lissek et al., 2005): Meta-analyses suggest that patients suffering from anxiety- and stress-related disorders show enhanced responding to the safe CS- during fear acquisition (Duits et al., 2015). During extinction, patients exhibit stronger defensive responses to the CS+ and a trend toward increased discrimination between the CS+ and CS- compared to controls, which may indicate delayed and/or reduced extinction (Duits et al., 2015). Furthermore, meta-analytic evidence also suggests stronger generalization to cues similar to the CS+ in patients and more linear generalization gradients (Cooper, van Dis, et al., 2022; Dymond, Dunsmoor, Vervliet, Roche, & Hermans, 2015; Fraunfelter, Gerdes, & Alpers, 2022). Hence, aberrant fear acquisition, extinction, and generalization processes may provide clear and potentially modifiable targets for intervention and prevention programs for stress-related psychopathology (McLaughlin & Sheridan, 2016).”

      Recommendations for the authors:

      Abstract:

      Comment 1:

      (a) It does not succinctly describe the background rationale well (i.e. it tries to say too much). It should be streamlined. There is a lot of 'jargon', which muddies the results, and too many concepts are introduced at each part and assume knowledge from the reader. 

      We thank the reviewer for providing constructive guidance for revisions. We have revised our abstract according to these suggestions.

      (b) Multiple terms for childhood trauma are used: ACEs, early adversity, childhood trauma, and childhood maltreatment. Choose one term and stick to it to enhance clarity. Why not just use childhood adversity, as in the title? Related to this, the use of ACEs sets up an expectation that ACE questionnaire was used, so readers are then surprised to find they used the childhood trauma questionnaire.

      We thank the reviewer for bringing this to our attention. As suggested by the reviewer, we use the term “childhood adversity” in our revised manuscript.

      Introduction:

      Comment 2:

      The phrasing seems to 'exaggerate' the trauma problem and is too broad in the first paragraph - e.g., "two-thirds of people experience one or more traumatic events..." It is important to clarify that not all of these people will go on to develop behavioral, somatic, and psychopathological conditions. Could break this down more into how many people have low, moderate, or severe for clarity, as 1 childhood adversity is different to 5+, and the type.

      We thank the reviewer for bringing this to our attention and have revised the first paragraph accordingly (page 6). Please note, however, that in the literature typically a specific cut-off (e.g. moderate) is used and the number of individuals that would meet different cut-offs (e.g., low and high) are not specifically reported.

      “Exposure to childhood adversity is rather common, with nearly two thirds of individuals experiencing one or more traumatic events prior to their 18th birthday (McLaughlin et al., 2013). While not all trauma-exposed individuals develop psychopathological conditions, there is some evidence of a dose-response relationship (Danese et al., 2009; Smith & Pollak, 2021; Young et al., 2019). As this potential relationship is not yet fully clear, understanding the mechanisms by which childhood adversity becomes biologically embedded and contributes to the pathogenesis of stress-related somatic and mental disorders is central to the development of targeted intervention and prevention programmes.”

      Comment 3:

      The published cut-offs for exposed/unexposed should be indicated here.

      We have included the published cut-offs as suggested (page 10):

      We operationalize childhood adversity exposure through different approaches: Our main analyses employ the approach adopted by most publications in the field (see Ruge et al., 2024 for a review) - dichotomization of the sample into exposed vs. unexposed based on published cut-offs for the Childhood Trauma Questionnaire [CTQ; Bernstein et al. (2003); Wingenfeld et al. (2010)]. Individuals were classified as exposed to childhood adversity if at least one CTQ subscale met the published cut-off (Bernstein & Fink, 1998; Häuser, Schmutzer, & Glaesmer, 2011) for at least moderate exposure (i.e., emotional abuse  13, physical abuse  10, sexual abuse  8, emotional neglect  15, physical neglect  10).

      Comment 4:

      Please check for overly complex sentences, and reduce the complexity. For example: "In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to ACEs into statistical tests while acknowledging that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions."

      We have revised this section and carefully proofread our manuscript by paying attention to this (page 10):

      “In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to childhood adversity into statistical tests. At the same time, we acknowledge that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions”

      Here is another example of reducing the complexity of our sentences (page 6):

      “Learning is a core mechanism through which environmental inputs shape emotional and cognitive processes and ultimately behavior. Thus, learning mechanisms are key candidates potentially underlying the biological embedding of exposure to childhood adversity and their impact on development and risk for psychopathology (McLaughlin & Sheridan, 2016).”

      Methods:

      Comment 5:

      Is this study part of a larger project? These outcomes were probably not the primary outcomes of this multicenter project. The readers need to understand how this (crosssectional?) analysis was nested in this larger trial.

      We thank the reviewers and editor for bringing to our attention that this was not sufficiently clear. Thus far, we included the information that we used the participants recruited for large multicentric study in the main manuscript, but point to the inclusion of more information in the supplement (page 11):

      “In total, 1678 healthy participants (age_M_ = 25.26 years, age_SD_ = 5.58 years, female = 60.10%, male = 39.30%) were recruited in a multi-centric study at the Universities of Münster, Würzburg, and Hamburg, Germany (SFB TRR58). Data from parts of the Würzburg sample have been reported previously (Herzog et al., 2021; Imholze et al., 2023; Schiele, Reinhard, et al., 2016; Schiele, Ziegler, et al., 2016; Stegmann et al., 2019). These previous reports, also those focusing on experimental fear conditioning (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019), addressed, however, research questions different from the ones investigated here (see also Supplementary Material for details).”

      Moreover, we have included additional information on the larger trial in our revised supplement (page 2):

      “Participants of this study were recruited in a multi-centric collaborative research center “Fear, anxiety, anxiety disorders” joining forces between the Universities of Hamburg,

      Würzburg, and Münster, Germany (SFB TRR58). During the second funding period of (20132016), all three sites recruited a large sample (N ~500) in the context of the Z project. All participants underwent the cross-sectional experimental paradigm reported here and were additionally extensively characterized to allow specific subprojects to recruit target subpopulations serving different aims with a focus on molecular genetic, epigenetic, or other research questions (see Herzog et al. (2021); Imholze et al. (2023); Schiele, Reinhard, et al. (2016); Schiele, Ziegler, et al. (2016); Stegmann et al. (2019)). The question on the association of exposure to childhood adversity and recent adversity was part of the primary research question of one subproject led by the senior author of this work (B07, TBL) and was hence a research question of primary interest also for this multicentric project.”

      Comment 6:

      Table 1 does not include percentages (a reader must calculate them: for example, 15% exposed?). These numbers belong in the results (i.e., it is confusing to read about the exposed/non-exposed before we know how it has been calculated).

      We have added the percentages as suggested and have included information on how exposed and unexposed was calculated as a table caption. We have considered moving the table to the results section but find it more suitable here. 

      Comment 7:

      A procedure figure could be useful.

      We thank the reviewer for this advice and have included a procedure figure in the supplementary material.

      Comment 8:

      Physiological data recordings and processing paragraph: The reasoning as to why the authors chose log transformation over square root transformation, or an approach that does not require transformation is not clear.

      We thank the reviewer for notifying us that we did not make this point clear enough. We opted for a log-transformation and range-correction of the SCR data because we use these transformations consistently in our laboratory (e.g., Ehlers et al., 2020; Kuhn et al., 2016; Scharfenort & Lonsdorf, 2016; Sjouwerman et al., 2015; Sjouwerman et al. 2020). In addition, log-transformed and range-corrected data are assumed to be closer to a normal distribution, to have a lower error variance resulting in larger effect sizes (Lykken & Venables, 1971; Lykken, 1972; Sjouwerman et al., 2022), and appear to have - at least descriptively - higher reliability compared to raw data (Klingelhöfer-Jens et al., 2022). We added a sentence on this to the methods section (page 14):

      Note that previous work using this sample (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019) had used square-root transformations but we decided to employ a log-transformation and range-correction (i.e., dividing each SCR by the maximum SCR per participant). We used log-transformation and range-correction for SCR data because these transformations are standard practice in our laboratory and we strive for methodological consistency across different projects (e.g., Ehlers, Nold, Kuhn, Klingelhöfer-Jens, & Lonsdorf, 2020; Kuhn, Mertens, & Lonsdorf, 2016; Scharfenort, Menz, & Lonsdorf, 2016; Sjouwerman & Lonsdorf, 2020; Sjouwerman, Niehaus, & Lonsdorf, 2015). Additionally, log-transformed and rangecorrected data are generally assumed to approximate a normal distribution more closely and exhibit lower error variance, which leads to larger effect sizes (Lykken, 1972; Lykken & Venables, 1971; Sjouwerman, Illius, Kuhn, & Lonsdorf, 2022). Additionally, on a descriptive level, this combination of transformations appear to offer greater reliability compared to using raw data alone (Klingelhöfer-Jens, Ehlers, Kuhn, Keyaniyan, & Lonsdorf, 2022).

      Ehlers, M. R., Nold, J., Kuhn, M., Klingelhöfer-Jens, M., & Lonsdorf, T. B. (2020). Revisiting potential associations between brain morphology, fear acquisition and extinction through new data and a literature review. Scientific Reports, 10(1), 19894. https://doi.org/10.1038/s41598-020-76683-1

      Kuhn, M., Mertens, G., & Lonsdorf, T. B. (2016). State anxiety modulates the return of fear. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology, 110, 194–199. https://doi.org/10.1016/j.ijpsycho.2016.08.001

      Scharfenort, R., & Lonsdorf, T. B. (2016). Neural correlates of and processes underlying generalized and differential return of fear. Social Cognitive and Affective Neuroscience, 11(4), 612–620. https://doi.org/10.1093/scan/nsv142

      Sjouwerman, R., Niehaus, J., & Lonsdorf, T. B. (2015). Contextual Change After Fear Acquisition Affects Conditioned Responding and the Time Course of Extinction Learning—Implications for Renewal Research. Frontiers in Behavioral Neuroscience, 9. https://doi.org/10.3389/fnbeh.2015.00337

      Sjouwerman, R., Scharfenort, R., & Lonsdorf, T. B. (2020). Individual differences in fear acquisition: Multivariate analyses of different emotional negativity scales, physiological responding, subjective measures, and neural activation. Scientific Reports, 10(1), 15283. https://doi.org/10.1038/s41598-020-72007-5

      Comment 9:

      There are 24 lines of text of R packages. I do not think this is necessary for the manuscript document and could be moved to the Supplement.

      We thank the reviewer for this comment and understand that it may take a considerable amount of space to list all the references of the R packages. However, we think it is important to prominently credit the respective authors of the R packages. Yet, if this is an important concern of the reviewer and editor, we will reconsider this point.

      Comment 10:

      It is not clear why the authors chose to analyze summary scores across trials rather than including a time factor for the acquisition phase.

      We would like to thank the reviewer for highlighting that the factor time may be interesting as well. However, we think that in our case the time factor is less interesting, as the acquisition effect itself is rather strong. Nevertheless, we have included a figure in the supplement that shows the time course of the SCR by displaying trial-by-trial data across the acquisition and generalization phase for transparency. This figure (Supplementary figure 4) shows that the trajectories appear to barely differ between individuals who were unexposed vs. exposed to moderate childhood adversity. Hence, we think that the analysis approach we have chosen is unlikely to overshadow central time-depending effects. However, if the reviewer and editor has strong feelings about this point, we will consider integrating additional analyses including the time factor in the supplement.

      Results:

      Comment 11:

      The caption of Figure 3 does not match the figure. Please check this.

      We thank the reviewers and editor for attentive reading and have revised this part.

      References:

      Comment 12:

      The Ruge et al paper that is cited many times throughout does not have a valid DOI in the References section. Additionally, the author list on the preprint server is substantially different from that listed in the manuscript. Please correct this reference.

      We thank the reviewers and editor for attentive reading and have corrected this reference. The provided doi was functioning at our end and we hope that this now also applies to the reviewers.

    2. Reviewer #1 (Public review):

      This is a very important paper, using a large dataset to definitively understand a phenomenon so far addressed using a range of diverging definitions and methods, typically with insufficient statistical power.

    3. Reviewer #2 (Public review):

      Summary:

      This important study uses convincing evidence to compare how different operationalizations of adverse childhood experience exposure related to patterns of skin conductance response during a fear conditioning task in a large sample of adults. Specifically, the authors compared the following operationalizations: dichotomization of the sample into "exposed" and "non-exposed" categories, cumulative adversity exposure, specificity of adversity exposure, and dimensional (threat versus deprivation) adversity exposure. The paper is thoughtfully framed and provides clear descriptions and rationale for procedures, as well as package version information and code. The authors' overall aim of translating theoretical models of adversity into statistical models, and comparing the explanatory power of each model, respectively, is an important and helpful addition to the literature.

      Several outstanding strengths of this paper are the large sample size and its primary aim of statistically comparing leading theoretical models of adversity exposure in the context of skin conductance response. This paper also helpfully reports Cohen's d effect sizes, which aid in interpreting the magnitude of the findings. The methods and results are thorough and well-described.

    1. Reviewer #1 (Public review):

      Summary:

      The authors have nicely demonstrated the efficiency of the HCR v.3.0 using hr38 mRNA expression as a marker of neuronal activity. This is very important in the Drosophila neuroscience field as in situ hybridization in adult Drosophila brains have been so far very challenging to do and replicate. The HCR v.3.0 has been described before [Choi et al., (2018)] and is now the property of the non-profit organization Molecular Technologies, who are the ones responsible for designing the probes. Here, taking advantage of this new FISH method, the authors have demonstrated the use of the FISH to identify neurons activated by a specific behavioral task using hr38 mRNA as a marker of neuronal activation. They named their method HI-FISH.<br /> In addition, based on the catFISH method [Guzowski et al., 1999], the authors were able to distinguish between newly activated neurons (nascent nuclear mRNA) and mature hr38 mRNA showing an earlier activation. They describe this method as HI-catFISH.<br /> Finally, to test what are the neurons activated downstream of their neuronal group of interest, the authors combined the HI-FISH method with optogenetic using chrimson. They named this method opto-HI-FISH.

      Using these three new methods, the authors have addressed the following biological question: are love and aggressiveness neuronally the same in Drosophila?<br /> Here, the authors focused on the male specific P1a neurons which are activated by both an aggressive context (male-male encounter) and sexual context (male female encounter).

      Strengths:

      The demonstration of the efficiency of the method is very convincing and well-performed. It gives the will for the reader to apply the method to their own subject.

      Weaknesses:

      The more neurons are present, the more difficult it is to identify neurons. This is something to take into account when applying these methods.

    2. Reviewer #2 (Public review):

      Summary:

      Watanabe et al. introduce a novel approach for activity-dependent labeling of neural circuits in Drosophila at single-cell resolution, based on detecting the expression of the immediate early gene Hr38 using in situ hybridization. While activity mapping of neurons during specific behaviors is well-established in rodent models, its application in Drosophila has been limited, primarily due to technical constraints. By overcoming these challenges, this study tackles an important and timely issue, providing a foundational tool that will serve as a key reference in the field of circuit neuroscience.

      Strengths:

      The principal strength of this method lies in its versatility and high sensitivity. It can be applied to a broad range of biological questions and enables the investigation of dynamic transcriptional regulation across an unlimited number of genes with a strong signal-to-noise ratio. As such, it holds great potential for widespread use across research labs.

      Weaknesses:

      No major weaknesses; all concerns have been adequately addressed.

    3. Author response:

      Reviewer #1:

      Response to Public Review

      We thank the reviewer for taking the time to carefully read our paper and to provide helpful comments and suggestions, most of which we have incorporated in our revised manuscript.  One of this reviewer’s (and reviewer #2’s) main concerns was that the confocal images provided in some cases did not appear to reflect the quantitative data in the bar graphs.  These images were provided only for illustrative purposes, to give the reader a sense of what the primary data look like. The reviewer may not have appreciated that the quantitative data reflect counts of RNA smFISH signals (dots) in hundreds of cells collected through z-stacks comprising multiple optical sections in multiple flies for each condition  For example, in P1a control condition (in Figure 2A), we have analyzed 135 neurons from 8 individuals. There, the number of z-planes ranged from 3 to 8 per hemisphere. It is generally not possible to find a single confocal section that encompasses quantitatively the statistics that are presented in the graphs. Presenting the data as an MIP (Maximum Intensity Projection, i.e., collapsed z-stack) in a single panel would generate an image that is too cluttered to see any detail.  We have now included, for the reader’s benefit, additional example confocal sections in both a z-stack and from the opposite hemisphere, in Supplemental Figure S4D. We have also inserted clarifying statements in the text on p. 7 (lines 154-156).

      Another suggestion from Reviewer #1 is that "it would be more informative to separate in the quantification between the GAL4-expressing neurons and the non-expressing ones" based on the presented pictures where more non-P1a neurons (that the reviewer speculates may be pC1-type neurons) are activated by a male-male encounter than by a male-female encounter, while the P1a-positive neurons seem to be more responsive during courtship behavior. In this paper, we were not looking at pC1 neurons and did not try to answer which neuronal population(s) outside of the P1a population is/are responsible for aggression and/or courtship. Rather, we focused on P1a neurons and addressed whether P1a neurons that induce both aggression and courtship behavior when they are artificially activated (Hoopfer et al. 2015) are also naturally activated during spontaneous performance of these two social behaviors. However, this result did not exclude the possibility that P1a neurons were inactive during naturalistic courtship or aggression. Our data in the current manuscript provide further experimental evidence in support of the idea that P1a neurons as a population play a role in both of these behaviors. Moreover, we provided data identifying P1a neurons activated only during aggression or during courtship (or both). However this does not exclude that pC1 or other neighboring populations are activated during aggression as well (See also the response to 'Recommendations For The Authors' and text lines 151-154).

      In Figure 3, we used opto-HI-FISH to identify candidate downstream targets (direct or indirect) of P1a neurons. We used 50 Hz Chrimson stimulation to activate P1a neurons to induce expression of Hr38 and identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells. In Figure 3 – supplement we performed calcium imaging of KCs and PAM neurons in response to P1a optogenetic stimulation to confirm independently our results from the Hr38 labeling experiments. That control was the purpose of that supplemental experiment.

      Based on those imaging data, the reviewer asked the further question of which [natural] behavioral context induces Hr38 expression in these populations (i.e., mating or aggression). This question is reasonable because our calcium imaging data (Figure 3-supplement) showed that both Kenyon cells and PAM neurons are active only during photo-stimulation of P1a neurons.  Our previous behavioral studies (Inagaki et al., 2014; Hoopfer et al., 2015) showed that 50 Hz photo-stimulation of P1a neurons in freely moving flies induced unilateral wing extension during stimulation, while aggression was observed only after the offset of the stimulation (Hoopfer et.al., 2015). Based on the comparison of those behavioral data to the imaging results in this paper, the reviewer suggested that Kenyon cells and PAM neurons are activated during courtship rather than during aggression. This is certainly a possible interpretation. However it is difficult to extrapolate from behavioral experiments in freely moving animals to calcium imaging results in head-fixed flies, particularly with response to neural dynamics.  Furthermore, Hr38 expression, like that of other IEGs (e.g., c-fos), may reflect persistently activated 2nd messenger pathways (e.g., cAMP, IP3) in Kenyon cells and PAM neurons that are not detected by calcium imaging, but that nevertheless play a role in mediating its behavioral effects. We still do not understand the mechanisms of how optogenetic stimulation of P1a neurons in freely behaving flies induces aggression vs. courtship behavior. Although 50 Hz stimulation of P1a neurons does not induce aggressive behavior during photo-stimulation, it is possible that this manipulation activates both aggression and courtship circuits, but that the courtship circuit might inhibit aggressive behavior at a site downstream of the MB (e.g., in the VNC). Once stimulation is terminated and courtship stops the fly would show aggressive behavior, due to release of that downstream inhibition (see Models in Anderson (2016) Fig 2d, e). In that case, there would be no apparent inconsistency between the imaging data and behavioral data. We agree that the reviewer's question is interesting and important but we feel that answering this question with decisive experiments is beyond the scope of this manuscript.

      Finally, Reviewer #1 suggested a method to evaluate the Hr38 signals in the catFISH experiment of Figure 4. We appreciate their suggestions, but the way that we evaluated the Hr38 signals was basically the same as the way the reviewer suggested. We apologize for the confusion caused by the lack of detailed descriptions in the original manuscript. We have now revised the methods section to explain more clearly how we define the cells as positive based on Hr38EXN and Hr38INT signals.

      Response to Recommendations for the authors:

      “To strengthen the author's argumentation, I would distinguish in their quantification between gal4+ from the other [classes of neighboring neurons]” (Fig. 2 and 4).”

      Our focus in this paper was to ask simply whether P1a neurons are active or not active during natural occurrences of the social behaviors they can evoke when artificially activated. We did not claim that they are the only cells in the region that control the behaviors.  It is not possible to compare their activation to that of 'other' cells neighboring P1a neurons without a separate marker to identify those cells driven by a different reporter system (e.g., LexA). This in turn would require repeating all of the experiments in Figs 2 and 4 from scratch with new genotypes permitting dual-labeling of the two populations by different XFPs, and quantifying the data using 4-color labeling. We respectfully submit that such curiosity-driven experiments, while in principle interesting, are beyond the scope of the present manuscript.  However, we have inserted text to acknowledge the possibility that the aggression-activated Hr38 signals in P1a- cells neighboring P1a+ cells may correspond to other classes of P1 neurons (of which there are 70 in total) or to pC1 cells. Changes:  Text lines 151-154.

      “if the magenta dot is outside of the nuclei I would not count this as positive also the size of the dot seems to be a good marker of the reality of the signal). I would measure the intensity of the hr38EXN. A high Hr38EXN level associated with the presence of hr38INT would indicate that the cell has been activated during both encounters, while a lower hr38EXN with no hr38INT would suggest only an activation during the 1st behavioural context. Finally, a lower hr38EXN associated with the presence of hr38INT would suggest the opposite, an activation only during the 2nd behaviour.”

      We agree that there are some tiny dot signals with hr38 INT probe that are more likely the background signals. We only counted the INT probe signals as positive when the cells had a clearly visible dot and also co-localize with the exonic probe's signal, as primary (un-spliced) Hr38 transcripts in the nucleus should be positive for both EXN and INT probes. Regarding the reviewer’s latter comments, we agree with their interpretation of the catFISH results and that is how we interpreted them originally. We measured the intensity of hr38EXN expression and defined hr38EXN-labeled cells as “positive” when the relative intensity was 3σ >average, a stringent criterion. In the revised manuscript, we added more detailed information in the methods section regarding our criteria for defining cell types as positive.

      “Knowing that the P1a neurons (using the split-gal4) can trigger only wing extension when activated by optogenetic 50Hz, I would test to which behavioral context the MB neurons and the PAM neurons positively respond to.”

      As we answered in 'Response to Public Review,' our opto-HI-FISH experiments identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells, using Hr38 labeling. The purpose of the calcium imaging experiment in Figure 3 – supplement was to confirm the P1a-dependent activation of KCs and PAM neurons using an independent method. In that respect this control experiment was successful in that methodological confirmation. The reviser raised an interesting question about how our calcium imaging experiments relate to our behavioral experiments, in terms of the dynamics of KC and PAM activation. A recent publication (Shen et al., 2023) revealed that courtship behavior has a positive valence and that activation of P1 neurons mimics a courtship-reward state via activation of PAM dopaminergic neurons. Therefore, it is reasonable to think that PAM neurons (and Kenyon cells as downstream of PAM neurons) are activated during female exposure. However those data do not exclude the possibility that inter-male aggression is also rewarding in Drosophila males, as it has shown to be in mice. This is an interesting curiosity-driven question that has yet to be resolved.  Therefore, as mentioned in the 'Response to Public Review,' we feel that the additional experiment the reviewer suggests is beyond the scope of our manuscript.

      Changes: None.

      Minor comments:

      “Please provide different pictures from main fig2 and sup2 for the three common conditions (control, aggression, and courtship).” 

      The data set for Figure 2 and Figure 2 supplement are from the same experiment. Because of the limited space, we just presented the selected key conditions ('Control', 'Aggression', and 'Courtship') in the main figure and put the complete data set (including these three key conditions) in the supplemental figure.

      Changes: None

      “Please, provide scale bars for the images.”

      Also, Reviewer #2 commented, 'Scale bars are missing on all the images throughout the main and supplementary figures.'

      We have now added scale bars for each figure. 

      “Fig.1: “Is the chrimsonTdtom images from endogenous fluorescence? It is not said in the legend and anti-dsred is not provided in the material and method while anti-GFP is.”

      We are sorry for the confusion and thank the reviewer for raising that question. The signals were native fluorescence, and we have now added that information to the figure legend.

      P7: "As an initial proof-of-concept application of HI-FISH, we asked whether neuronal subsets initially identified in functional screens for aggression-promoting neurons (Asahina et al., 2014; Hoopfer et al., 2015; Watanabe et al., 2017) were actually active during natural aggressive behavior. These included P1a, Tachykinin-FruM+ (TkFruM), and aSP2 neurons". Please put the references to the corresponding group of neurons listed. For example: "These included P1a neurons [Hoopfer et al., 2015]". 

      We have now added these references.

      P9: "Optogenetic and thermogenetic stimulation experiments have shown that that P1a interneurons can promote both male-directed aggression and male- or female-directed courtship" typo

      We appreciate the reviewer for catching this error and have corrected the text.

      (P10:" To validate this approach, we first asked whether we could detect Hr38 induction in pCd neurons, which were previously shown by calcium imaging to be (indirect) targets of P1a neurons". Reference [Jung et al., 2020] 

      We have now added this reference.

      Fig. 4A: Put the time scale on the diagram (3h adaptation-20min-30min rest-20min-10min rest-collect) 

      We have now added the time scale in Figure 4A.

      Reviewer #2: 

      Response to Public Review: 

      We thank the reviewer for their helpful comments and suggestions. We have addressed most of them in our revised manuscript. The main concern of Reviewer #2 was the temporal resolution of the HI-catFISH experiment shown in Figure 4 and Figure 4-Supplement. Our original manuscript illustrated temporal patterns of Hr38EXN and Hr38ITN signals concomitant with different behavioral paradigms (Figure 4B). The reviewer pointed out that the illustrated experimental design does not reflect the actual data shown in Figure 4-Supplement A-C. We believe this issue was raised because we drew the temporal pattern of Hr38EXN signals in Figure 4B based on the intensity of Hr38EXN signals (Figure 4-Supplement B) rather than based on the % number of positive cells (Figure 4-Supplement C). We have now revised the schematic time course of Hr38EXN signals in Figure 4B using the % of positive cells. We believe this change will be helpful for readers to understand better the experimental design since we used the % of positive cells to identify patterns of P1a neuron activation during male-male vs. male-female social interactions in Figure 4D. Another suggestion from Reviewer #2 was to add additional controls, such as the quantification of the intronic and exonic Hr38 probes after either only the first or second social context exposure. In response, we have now added the data from only the first social context (Figure 4C, and 4D, right column). These new data provides evidence that there are essentially no detectable Hr38INT signals 60 minutes later without a second behavioral context, while Hr38EXN signals are still present at the time of the analysis.  Unfortunately, we are not able to provide the converse dataset with the second behavioral context only to show that Hr38 INT signals are detected. On this point, we call the reviewer’s attention to Figure 4-supplement-S4A-C, which show that the INT probe signals are detectable at 15 and 30 minutes following stimulation, but not at 60 minutes.  In the experiment of Fig. 4B, flies are fixed and labeled for Hr38 30 minutes after the beginning of the second behavior, conditions under which we should obtain robust INT signals (as observed).  EXN signals are also expected at 30 minutes because the primary (non-spliced) RNA transcript detected by the INT probe also contains exonic sequences.

      Response to Recommendations for the authors:

      Given that the development of in situ HCR for the adult fly brain is so central to the present manuscript, I think that the methods section describing the HCR protocol can be significantly improved. In particular, the authors should fully describe the in situ HCR protocol including the 'minor modifications' they refer to, and define how they calculate the 'relative intensity to the background'.

      We appreciate the reviewer’s suggestion. We have now revised the methods section to describe the procedure in more detail. Also, we will submit a separate document describing the HI-FISH protocol.

      Note: The authors refer to a recently published paper by Takayanagi-Kiya et al (2023) describing activity-based neuronal labeling using a different immediate early gene, stripe/egr-1. The authors state the following: 'That study used a GAL4 driver for the stripe/egr-1 gene to label and functionally manipulate activated neurons. In contrast, our approach is based purely on detecting expression of the IEG mRNA using..'. Takayanagi-Kiya et al. (2023) also use in situ mRNA detection of the IEG stripe/egr-1 and not only a GAL4 driver system. This claim should be modified and the paper should be cited in the introduction of the present paper.

      We have now cited the paper in the Introduction and have modified and moved the description originally in 'Note' section to Discussion (text lines: 392-404) as the reviewer requested. We have emphasized the difference between the two approaches for comparing neuronal activities during two different behaviors within the same animal. Takayanagi-Kiya used GAL4/UAS and stripe protein expression with immunohistochemistry to analyze neuronal activities during two different behaviors, while we exclusively analyzed Hr38 mRNA expression for this purpose, using intronic and exonic Hr38 probes. This approach made it possible to perform catFISH with higher temporal resolution and also allows extension of our approach to other IEGs for which antibodies are not available.

      Please specify the nature of the iron fillings in the methods section.

      We added a detailed description in the methods section, including the catalog number.

      In Figure 1B, the authors may add a dashed outline to the regions magnified in 1C so that readers can more easily follow the figures. Moreover, it would be informative to see a more detailed quantification of the number of Hr38-positive cells in different brain regions marked by Fru-GAL4.

      We have now added the whole brain images for each condition in Figure 1C and also quantitative data in Figure 1-Supplement C, as the reviewer suggested.

      In the middle right aggression panel of Figure 2A, it looks as if one P1a neuron is not outlined.

      We have carefully examined other z-planes through this region and based on those data have concluded that the signals mentioned by the reviewer are neurites from neurons labeled in other z-planes.

      Changes: None.

      The images in Figure 2A can be again found in Figure Supplement 2A, yet the number of neurons analyzed suggests the quantification was performed from different samples. The images in Figure Supplement 2A should be either changed or it should be explained as to why the images are the same yet the numbers in the legend are different.

      We apologize for the confusion. Figure 2 and Figure 2-Supplement are from the same experiment. To avoid clutter we illustrated three key conditions ('Control,' 'Aggression,' and 'Courtship') in the main figure. The reason why the numbers in the legend are different is that the purpose of presenting Figure 2-Supplement B-D was to determine whether there were differences in the intensity of Hr38 FISH signals in the neurons considered as 'positive' in different conditions. Therefore, the numbers described in Figure 2-Supplement legend are derived only from those neurons that were considered Hr38-positive, while the numbers in Figure 2 include all neurons analyzed. We have now added notes to explain this in the Figure 2 – supplement legend.

      The panels of the quantification of the Hr38 relative intensity in Figure 2B/C/D are very difficult to read, ideally, they should be plotted as in Figure Supplement 2B/C/D.

      The graphs in Figure 2B-D (upper) show data from all GFP-labeled cells scored, including cells defined as 'negative' or 'borderline.' In contrast, the graphs in Figure 2-supplement show the relative Hr38 signal intensity in those GFP neurons defined as positive based on the analysis in Fig. 2B. If we were to plot the data in Fig. 2B (upper) as box plots (like that in Figure-2-supplement), we would see either a skewed (only negative cells) or a bimodal distribution (one around the negative population and the other around the positive population); the shapes of these distributions would likely be hidden in the box-whisker plots format. Therefore, we prefer to plot all of the data points as we did in the original manuscript. However, we agree that the data points in the original manuscript were hard to read. We therefore changed the format of the datapoints from blurry dots to open circles with clear solid lines.

      In Figure 2B/C/D, please specify in the figure legend what 'grouped in categories according to character' means. 

      We used letters to mark statistically significant differences (or lack thereof) between conditions. Bars sharing at least one common letter are not significantly different.  If they do not share any letter, they are significantly different. For example, Aggression: bc vs. Dead: bc, means no difference. Aggression: bc vs. No Food: b, or Aggression: bc vs. Courtship: c also means no difference between Aggression and each of the two other conditions. However, 'No Food: b' and 'Courtship: c' have no common letter, meaning they are different. This is a standard method for showing statistically comparisons among multiple bars without lots of asterisks and horizontal bars cluttering the figure, and we have revised the legend to clarify what each letter means. We have also removed the color shading in Figure 2 B-D as it may have been confusing.

      A quantification of the number of Hr38-positive neurons and Hr38 relative intensity during the entire time course would be informative in Figure 3D. 

      Although the data set for this figure is different from that for Figure 4-Supplement A-C, the main claim is the same. Therefore, Figure 4 - Supplement essentially provides the information that the reviewer suggested. However, we also reanalyzed the data set used for the original Figure 3D and evaluated % positive cells at the 30-minute time point and have now added that number in the figure legend.

      In the legend of Figure 3D, it says '..The expression level reaches its peak at 30-60min', yet I don't see timepoints beyond 60min. Please rephrase or add additional timepoints. 

      We apologize for the error. We have rephrased the text.

      Figure Supplement 3A/D: please add an outline or a schematic figure to better understand where the imaging is performed.

      We added illustrated schemas next to the title of each experiment (P1->PAM neurons (bundle) and P1 -> Kenyon cells (bundle)).

      Figure Supplement 3C/F: please add information about the statistical test to the corresponding figure legend.

      We have added a phrase to describe the test used.

      Figure Supplement 3G/H/I/J: motion artifacts can potentially strongly affect the performed analysis given that cell bodies are very small and highly subjected to motion. Can the authors comment on how they corrected for motion?

      We have now described how we corrected for motion artifacts in the Methods section.

      Figure 4C/D: It seems as if the representative images don't reflect the quantification, e.g., in the male -> female panel, close to 100% of the neurons are positive for the exonic probe as opposed to approx. 40% in the bar graph.

      Please see our response to this issue in the 'Response to Public Review (Reviewer #1)'.

      Additional controls should be included in Figure 4C in order to assess the temporal resolution of HI-CatFISH more in detail (see 'Weaknesses').

      We have also answered this in the 'Response to Public Review'.

      The authors should adjust the scheme in the main Figure 4B to reflect the data presented in Figure S4A and C. For instance, the peak for the intronic version is observed at 15 minutes, while at 30 minutes, both the exonic and intronic signals show an equal level of signal.

      We have addressed this issue in the 'Response to Public Review'.

      We thank the reviewers again for their helpful comments and hope that with these changes, the manuscript will now be acceptable for official publication in eLife.

    1. eLife Assessment

      This important study provides interesting insights into the mechanisms of action of adjuvants. It shows that adjuvants, MPLA and CpG especially, modulate the peptide repertoires presented on the surface of antigen presenting cells, and surprisingly, adjuvant favored the presentation of low-stability peptides rather than high-stability peptides by antigen presenting cells. As a result, the low stability peptide presented in adjuvant groups elicits T cell response effectively. Evidence in support of these conclusions is solid, and this paper would be of interest to vaccinologists and immunologists.

    2. Reviewer #1 (Public Review):

      Summary:

      Li et al investigated how adjuvants such as MPLA and CpG influence antigen presentation at the level of the Antigen presenting cell and MHCII : peptide interaction. They found that use of MPLA or CpG influences the exogenous peptide repertoire presented by MHC II molecules. Additionally, their observations included the finding that peptides with low-stability peptide:MHC interactions yielded more robust CD4+ T cell responses in mice. These phenomena were illustrated specifically for 2 pattern recognition receptor activating adjuvants. This work represents a step forward for how adjuvants program CD4+ Th responses and provide further evidence regarding expected mechanisms of PRR adjuvants in enhancing CD4+ T cell responses in the setting of vaccination.

      Strengths:

      The authors use a variety of systems to analyze this question. Initial observations were collected in an H pylori model of vaccination with a demonstration of immunodominance differences simply by adjuvant type, followed by analysis of MHC:peptide as well as proteomic analysis with comparison by adjuvant group. Their analysis returns to peptide immunization and analysis of strength of relative CD4+ T cell responses, through calculation of IC:50 values and strength of binding. This is a comprehensive work. The logical sequence of experiments makes sense and follows an unexpected observation through to trying to understand that process further with peptide immunization and its impact on Th responses. This work will premise further studies into the mechanisms of adjuvants on T cells

      Weaknesses:

      While MDP has a different manner of interaction as an adjuvant compared to CpG and MPLA, it is unclear why MDP has a different impact on peptide presentation and it should be further investigated, or at minimum highlighted in the discussion as an area that requires further investigation.

      It is alluded by the authors that TLR activating adjuvants mediate selective, low affinity, exogenous peptide binding onto MHC class II molecules. However, this was not demonstrated to be related specifically to TLR binding. Wonder if some work with TLR deficient mice (TLR 4KO for example) could evaluate this phenomenon more specifically

      Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low stability peptides among the peptides analyzed.

    3. Reviewer #2 (Public Review):

      Adjuvants boost antigen-specific immune responses to vaccines. However, whether adjuvants modulate the epitope immunodominance and the mechanisms involved in adjuvant's effect on antigen processing and presentation are not fully characterized. In this manuscript, Li et al report that immunodominant epitopes recognized by antigen-specific T cells are altered by adjuvants.

      Using MPLA, CpG, and MDP adjuvants and H. pylori antigens, the authors screened the dominant epitopes of Th1 responses in mice post-vaccination with different adjuvants and found that adjuvants altered antigen-specific CD4+ T cell immunodominant epitope hierarchy. They show that adjuvants, MPLA and CpG especially, modulate the peptide repertoires presented on the surface of APCs. Surprisingly, adjuvant favored the presentation of low-stability peptides rather than high-stability peptides by APCs. As a result, the low stability peptide presented in adjuvant groups elicits T cell response effectively.

    4. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Li et al investigated how adjuvants such as MPLA and CpG influence antigen presentation at the level of the Antigen-presenting cell and MHCII : peptide interaction. They found that the use of MPLA or CpG influences the exogenous peptide repertoire presented by MHC II molecules. Additionally, their observations included the finding that peptides with low-stability peptide:MHC interactions yielded more robust CD4+ T cell responses in mice. These phenomena were illustrated specifically for 2 pattern recognition receptor activating adjuvants. This work represents a step forward for how adjuvants program CD4+ Th responses and provides further evidence regarding the expected mechanisms of PRR adjuvants in enhancing CD4+ T cell responses in the setting of vaccination.

      Strengths:

      The authors use a variety of systems to analyze this question. Initial observations were collected in an H pylori model of vaccination with a demonstration of immunodominance differences simply by adjuvant type, followed by analysis of MHC:peptide as well as proteomic analysis with comparison by adjuvant group. Their analysis returns to peptide immunization and analysis of strength of relative CD4+ T cell responses, through calculation of IC:50 values and strength of binding. This is a comprehensive work. The logical sequence of experiments makes sense and follows an unexpected observation through to trying to understand that process further with peptide immunization and its impact on Th responses. This work will premise further studies into the mechanisms of adjuvants on T cells.

      Weaknesses:

      Comment 1. While MDP has a different manner of interaction as an adjuvant compared to CpG and MPLA, it is unclear why MDP has a different impact on peptide presentation and it should be further investigated, or at minimum highlighted in the discussion as an area that requires further investigation.

      Thank you for the suggestion. We investigated the reasons for the different effects of MDP on peptide presentation compared with those of CpG and MPLA. We found that the expression of some proteins involved in antigen processing and presentation, such as CTSS, H2-DM, Ifi30, and CD74, was substantially lower in the MDP-treated group than in the CpG- and MPLA-treated groups. To further confirm whether these proteins play a key role during adjuvant modification of peptide presentation, we knocked down them using shRNA and then performed immunopeptidomics. The original mass spectra and peptide spectrum matches have been deposited in the public proteomics repository iProX (https://www.iprox.cn/page/home.html) under accession number IPX0007611000. Unfortunately, the expected results for peptide presentation repertoires were not observed. Thus, we hypothesized that the different effects of MDP on peptide presentation might not result from differences in protein expression. We cannot exclude the possibility that some other proteins that may be important in this process were overlooked. We are still working on the mechanisms and do not have an exact conclusion. Thus, we did not present related data in this manuscript.

      The related statements were added in the Discussion section on page 13, lines 292–299: “In this study, we found that the peptide repertoires presented by APCs were significantly affected by the adjuvants CpG and MPLA, but not MDP. All three adjuvants belong to the PRR ligand adjuvant family. CpG and MPLA bind to TLRs and MDP is recognized by NOD2. Although the receptors are different, many common molecules are involved both in TLR and NLD pathway activation. Unfortunately, we did not demonstrate why the MDP had different impacts on peptide presentation compared with other adjuvants. Further investigation is required to clarify the mechanism by which MPLA, CpG, and MDP adjuvants modulate the presentation of peptides with different stabilities.”

      Comment 2. It is alluded by the authors that TLR activating adjuvants mediate selective, low affinity, exogenous peptide binding onto MHC class II molecules. However, this was not demonstrated to be related specifically to TLR binding. I wonder if some work with TLR deficient mice (TLR 4KO for example) could evaluate this phenomenon more specifically.

      Thank you for the suggestion. This is an important point that was overlooked in this study. Based on published research on the mechanisms of PRR adjuvants, CpG and MPLA, we believe that the effect of CpG and MPLA on APCs-selective epitope presentation needs to be bound to the corresponding receptor, although we did not give a definitive conclusion in the manuscript.

      To confirm the TLR-activating adjuvants affecting peptides presented on MHC molecules specifically through TLR binding, we have used CRISPR-cas9 to knock out TLR4 and TLR9 of A20 cells and repeated the experiments, as suggested. We chose TLR4- and TLR9- knockout A20 cell lines instead of TLR-deficient mice because a large number of APCs are required for immunopeptidomics. Moreover, the data observed in this study were based on the A20 cell line. However, these experiments are time-consuming. Unfortunately, we were unable to provide timely data. In addition, we believe that elucidating the downstream molecular mechanisms of TLR activation is necessary, as mentioned in comment 1. All these data will be combined and reported in our upcoming publications.

      Comment 3. It is unclear to me if this observation is H pylori model/antigen-specific. It may have been nice to characterize the phenomenon with a different set of antigens as supplemental. Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low-stability peptides among the peptides analyzed.

      Q1: It is unclear to me if this observation is H. pylori model/antigen-specific. It may have been nice to characterize the phenomenon with a different set of antigens as supplemental.

      Thank you for the comment. To confirm the effect of the adjuvant on the exogenous peptide repertoire presented by MHC II molecules, a set of antigens from another bacterium, Pseudomonas aeruginosa, was used, and the experiments were repeated. The A20 cells were treated with CpG and pulsed with Pseudomonas aeruginosa antigens. Twelve hours later, MHC-II–peptide complexes were immunoprecipitated, and immunopeptidomics were performed. The data are shown below (Author response image 1). Information on the MHC-peptides from Pseudomonas aeruginosa is given in the Supplementary Table named “Table S3 Response to comment3”. A total of 713 and 205 bacterial peptides were identified in the PBS and CpG groups (Author response image 1A). The number of exogenous peptides in the CpG-treated group was significantly lower than that in the PBS-treated control group (Author response image 1B). A total of 568 bacterial peptides were presented only in the PBS group; 60 bacterial peptides were presented in the CpG-treated group, and 145 bacterial peptides were presented in both groups (Author response image 1C). We then analyzed the MHC-binding stability of the peptides present in the adjuvant-treated group and that of the peptide-deficient after adjuvant stimulation using the IEDB website. We found that the IC50 of the peptides in the adjuvant-treated group were much higher than those of the deficient peptides, which indicated that the peptides presented in the CpG-treated groups have lower binding stability for MHC-II (Author response image 1D). These results indicate that CpG adjuvant affects the presentation of exogenous peptides with high binding stability, which is consistent with the data reported in our manuscript. Using another set of antigens, we confirmed that our observations were not H. pylori model- or antigen-specific.

      Author response image 1.

      MHC-II peptidome measurements in adjuvant-treated APCs pulsed with Pseudomonas aeruginosa antigens.

      (A) Total number of bacterial peptides identified in the PBS- and CpG-treated groups. (B) The number and length distribution of bacterial peptides in different groups were compared. (C) Venn diagrams showing the distribution of bacterial peptides in different groups. (D) IC50 of the presented, deficient, and co-presented peptides post-adjuvant stimulation from immunopeptidome binding to H2-IA and H2-IE were predicted using the IEDB website. High IC50 means low binding stability. *p<0.05, **p<0.01.

      Q2: Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low-stability peptides among the peptides analyzed.

      In this study, we used a peptide immunization experiment to evaluate the responses induced by the screened peptides with different stabilities. In addition to this method, tetramer staining and ELISA have been used to assess epitope-specific T-cell proliferation and cytokine secretion. Among these, tetramer staining is often used in studies involving model antigens. However, as many peptides were screened in our study, synthesizing a sufficient number of tetramers was difficult. However, we believe that the experimental data obtained in this study support the conclusion. Nevertheless, we agree that more methods applied will make the pattern more clearly.

      Reviewer #2 (Public Review):

      Adjuvants boost antigen-specific immune responses to vaccines. However, whether adjuvants modulate the epitope immunodominance and the mechanisms involved in adjuvant's effect on antigen processing and presentation are not fully characterized. In this manuscript, Li et al report that immunodominant epitopes recognized by antigen-specific T cells are altered by adjuvants.

      Using MPLA, CpG, and MDP adjuvants and H. pylori antigens, the authors screened the dominant epitopes of Th1 responses in mice post-vaccination with different adjuvants and found that adjuvants altered antigen-specific CD4+ T cell immunodominant epitope hierarchy. They show that adjuvants, MPLA and CpG especially, modulate the peptide repertoires presented on the surface of APCs. Surprisingly, adjuvant favored the presentation of low-stability peptides rather than high-stability peptides by APCs. As a result, the low stability peptide presented in adjuvant groups elicits T cell response effectively.

      Thanks a lot for your comments.

      Reviewer #1 (Recommendations For The Authors):

      Recommendation 1. Figure 6: The peptides considered low affinity- it would be helpful to specify from which adjuvant they were collected from. When they are pooled it is unclear if we are analyzing peptides collected from adjuvanting with any of the three adjuvants studied.

      Thank you for the suggestion. The related description in Figure 6 has been modified in the revised manuscript. Data for the peptides identified from the adjuvants MPLA- and CpG-treated groups are shown separately.

      Recommendation 2. It is unclear to me why the A20 cell line is less preferred to the J774 line for the immunopeptidome analysis - can the authors expand on this?

      We apologize for not clearly explaining this in the original manuscript. In fact, the A20 cell line is better than J774A.1 cell line for immunopeptidomics experiments. Compared to J774A.1 cells, more MHC-II peptides were obtained from a smaller number of A20 cells using immunopeptidomics. At the beginning of this study, we chose the J774A.1 cell line as it is a macrophage cell line. J774A.1 cells (up to 5×108) were pulsed with the antigens, and MHC-II–peptide complexes were eluted from the cell surface for immunopeptidomics. Unfortunately, only a few hundred peptides from the host were detected and no exogenous peptides were detected. Next, we tested the A20 cell line. In total, 108 A20 cells were used in this study. More than 3500 host peptides and approximately 50 exogenous peptides have been identified. These data indicate that the A20 cell line was better.

      To investigate the reasons for this, we detected MHC-II expression on cell surfaces using FACS. Our purpose was to elute peptides from MHC–peptide complexes present on the cell surface. Low MHC expression resulted in the elution of a few peptides. We found the MFI of MHC-II molecules on J774A.1 cell is about 500; however, the MFI of MHC-II molecules on A20 cells is more than 300,000. These data indicate that MHC-II expression on A20 cells was much higher than that on J774A.1 cells. J774A.1 cell is a macrophage cell line. Macrophages have excellent antigen phagocytic capabilities; however, their ability to present antigens is relatively weak. MHC molecules on the macrophage cell surface can be upregulated in the stimulation of some cytokines, for example, IFN-γ. In this study, we used adjuvants as stimulators and did not want to use additional cytokine stimulators. Thus, J774A.1 cells were not used in the present study.

      The related statements are reflected on page 6 lines 120–128 “We also selected another H-2d cell J774A.1, a macrophage cell line, for immunopeptidome analysis in this study. Briefly, 5×108 J774A.1 cells were used for immunopeptidomics. Moreover, fewer than 350 peptides were observed at a peptide spectrum match (PSM) level of < 1.0% false discovery rate (FDR). However, more than 5500 peptides were detected in 108 A20 cells at FDR < 1.0% (Figure S2A). CD86 and MHC-II molecule expression on J774A.1 cells was substantially lower than that on A20 cells (Figure S2B). Low MHC-II expression on J774A.1 cells could be the reason for the lack of peptides identified by LC–MS/MS. Thus, A20 cells instead of J774A.1 cells were used for the subsequent experiments.”

      Recommendation 3. Lines 172-177, can more details be provided about the whole proteome analysis? The plots are shown for relative representation of protein expression to PBS, but it is unclear to me what examples of these proteins are (IFN pathway, Ubiquitination pathway). Could these be confirmed by protein expression analyses in supplemental?

      Thank you for the suggestion. In this study, we conducted whole proteome analysis to investigate changes in protein expression across different pathways in the adjuvant groups. Through KEGG enrichment analysis, we compared the differential expression of MHC presentation pathway proteins (such as H2-M, Ifi30, CD74, CTSS, proteasome, and peptidase subunits) between the PBS- and adjuvant-treated groups using our proteome data. In addition, we focused on IFN and ubiquitination pathways that play crucial roles in antigen presentation modification and immune response. The proteins and their relative expression in these pathways are shown in Figure S4B. Details regarding the protein names and expressions are provided in Supplemental Table S2 of the revised manuscript.

      The original statements in the results “Then, we analyzed the whole proteome data to determine whether the proteins involved in antigen presentation and processing were altered. We found that proteins involved in antigen processing, peptidase function, ubiquitination pathway, and interferon (IFN) signaling were altered post adjuvants treatment, especially in MPLA and CpG groups (Figure 5C; Figure S4B and S4C). These data suggest that adjuvants MPLA and CpG may affect the antigen processing of APCs, resulting in fewer peptides presentation.” This has been revised on page 8 lines 172–182 as “We then investigated whole-proteome data to determine the evidence of adjuvant modification of antigen presentation. We focused on the proteins involved in antigen processing, peptidase function, ubiquitination pathway, and IFN signaling. The ubiquitination pathway and IFN signaling play crucial roles in the modification of antigen presentation and immune responses. Through KEGG enrichment analysis, we found that many proteins involved in antigen processing, peptidase function, ubiquitination pathways, and IFN signaling were altered after adjuvant treatment, particularly in the MPLA- and CpG-treated groups (Figure 5C; Figure S4B). The expression of each protein is shown in Figure S4C and Supplementary Table 2. These data suggest that MPLA and CpG adjuvants may affect the antigen processing of APCs, resulting in fewer peptide presentations.”

      Recommendation 4. Lines 212-218: I think there needs to be more discussion of interpretation here. Only one of the low-stability peptides required low concentrations for CD4+ T cell responses in vitro. What about the other peptides in the analysis? Perhaps if the data is taken together there is not a clear pattern?

      Thank you for the comment. In this study, epitope-specific CD4+ T-cells were expanded in vitro from the spleens of peptide-pool-immunized mice. T-cell responses to individual peptides were detected using ICS and FACS. Only one peptide, recA #23, with low binding stability, and one high-stability peptide, ureA #2, induced effective T-cell responses. Peptide ureA #3 with high stability induces low Th1 responses. The other peptides cannot induce CD4+ T-cell secreting IFN-γ (Data are shown in Author response image 2). Thus, we compared the strength of IFN-γ responses induced by these three peptides at a set of low concentrations. Data for other peptides without any response could not be taken together.

      Author response image 2.

      The expanded CD4+T cells from peptides immunized mice were screened for their response to the peptides in an ICS assay.

      In this study, we used a peptide pool containing four low-stability peptides to vaccinate mice; however, only one peptide induced an effective CD4+ T-cell response. We speculate that the possible reasons are as follows. First, the number of peptides used for vaccination is too small. Only four low-stability peptides were synthesized and used to immunize mice. Three of these could not induce an effective T-cell response, possibly because of their low immunogenicity. If more peptides are synthesized and used, more peptides that induce T-cell responses may be observed. Second, epitope-specific T-cell responses are variable. Responses to the subdominant peptides can be inhibited by the dominant peptide. The subdominant peptide can become dominant by changing the peptide dose or in the absence of the dominant peptide. Thus, we believe that responses to the other three peptides may be detected if mice are immunized with a peptide pool that does not contain a response epitope.

      The corresponding statements have been added to the Discussion section on page 13 lines 287–291 as “Unfortunately, only one peptide, recA #23, with low binding stability and induced significant Th1 responses, was identified in this study. To further confirm that low-stability peptides can induce stronger and higher TCR-affinity antigen-specific T-cell clonotype responses than high-stability peptides, further studies should monitor more peptides with different stabilities.”

      Recommendation 5. There are some areas where additional editing to text would be beneficial due to grammar (eg lines 122-126; line 116, etc).

      The manuscript has been edited by a professional language editing company.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 1. It is interesting that there was no difference in IFNg responses induced by different adjuvants.

      Thank you for the comment. Possible reasons for the lack of difference in IFN-γ responses could be as follows. First, all adjuvants used in this study have been confirmed to effectively induce Th1 responses. Second, in this study, IFN-γ responses were examined using expanded antigen-specific T cells in vitro. The in vitro cell expansion efficiency may have affected these results.

      Recommendation 2. The data to support the claim that changes in exogenous peptide presentation among adjuvant groups were not due to differences in antigen phagocytosis is insufficient.

      Thank you for the comment. In this study, proteomics of A20 cells pulsed with antigens in different adjuvant-treated groups were used to determine exogenous antigens phagocytosed by cells. In addition, we used fluorescein isothiocyanate (FITC)-labeled OVA to pulse APCs and detected antigen phagocytosis by APCs after treatment with different adjuvants. The MFI of FITC was detected by FACS at different time points. The data are shown below (Author response image 3). No obvious differences in FITC MFI were detected after adjuvant stimulation, indicating that antigen phagocytosis among the adjuvant groups was almost the same.

      A20 cells, used as APCs, are the B-cell line. Antigen recognition and phagocytosis by B-cells depends on the B-cell receptor (BCR) on the cell surface. The ability of BCRs to bind to different antigens varies, leading to significant differences in the phagocytosis of different antigens by B-cells. Therefore, detecting the phagocytosis of a single antigen may not reflect the overall phagocytic state of the B-cells. Thus, in this study, we used proteomics to detect exogenous proteins in B-cells pulsed with H. pylori antigens, which contain thousands of components, to evaluate their overall phagocytic capacity. Only the proteomic data are presented in our manuscript.

      Author response image 3.

      Antigen phagocytosis of A20 cells were measured using FITC-labeled OVA. (A) A20 cells were pulsed with FITC-labeled OVA. MFI of FITC was measured after 1 h. (B) MFI of FITC was examined post the stimulation of adjuvants at different time points.

      Recommendation 3. It is not clear how MPLA, CpG, and MDP adjuvants modulate the presentation of low vs high stability peptides.

      Thank you for pointing this out. We acknowledge that we did not clarify the mechanisms by which adjuvants affect the stability of the peptide presentations of APCs.

      We performed experiments to detect the expression of proteins involved in antigen processing and presentation in the different adjuvant-treated groups. Furthermore, shRNAs were used to knock down the expression of key molecules. Immunopeptidomics was used to detect peptide presentation. Unfortunately, the expected results for peptide presentation repertoires were not observed. We are still working on the mechanisms.

      Please also see our response to comment 1 of reviewer 1

      The related statements were added in the Discussion section on page 13, lines 292–299: “In this study, we found that the peptide repertoires presented by APCs were significantly affected by the adjuvants CpG and MPLA, but not MDP. All three adjuvants belong to the PRR ligand adjuvant family. CpG and MPLA bind to TLRs and MDP is recognized by NOD2. Although the receptors are different, many common molecules are involved both in TLR and NLD pathway activation.  Unfortunately, we did not demonstrate why the MDP had different impacts on peptide presentation compared with other adjuvants. Further investigation is required to clarify the mechanism by which MPLA, CpG, and MDP adjuvants modulate the presentation of peptides with different stabilities.”

    1. eLife Assessment

      This important work explores the modulation of pain by intense stress. The authors employed a series of cutting-edge techniques and provided convincing evidence suggesting that the dorsal lateral septum-> lateral hypothalamus-> rostral ventromedial medulla circuit is responsible for mediating stress-induced analgesia. This work will be of interest to neuroscientists interested in the neural circuits of behavior, and scientists interested in stress or pain.

    2. Reviewer #1 (Public review):

      The manuscript entitled "A septo-hypothalamic-medullary circuit directs stress-induced analgesia" by Shah et al., showed that the dLS-to-LHA circuit is sufficient and necessary for stress-induced analgesia (SIA), which is mediated by the rostral ventromedial medulla (RVM) in a opioid-dependent manner. This study is interesting and important and the conclusions are largely supported by the data. I have a few concerns as follows:

      (1) The present data show that activation of dLS neurons produces SIA, however, this manipulation is non-specific. It may be better to see the effect of specific manipulation of stress-activated c-Fos positive neurons in the dLS using combination of the Tet-Off system and chemogenetic/optogenetic tools.<br /> (2) Depending on its duration, and intensity, stress can exert potent and bidirectional modulatory effects on pain, either reducing pain (SIA) or exacerbating it (stress-induced hyperalgesia,SIH). Whether this circuit in the manuscript is involved in SIH.<br /> (3) It are well-accepted that opioid and cannabinoid receptors participate in the SIA, especially, a critical role of the RVM endocannabinoid system in the SIA, why author focus their study on opioid system?<br /> (4) Whether silencing of the dLS neurons affects stress-induced anxiety-like behaviors? Or, what is the relationship between of SIA and level of stress-induced anxiety?<br /> (5) Please provide the direct electrophysiological evidence for confirming the efficacy of the MP-CNO.<br /> (6) Whether LHA is a specific downstream target for SIA, whether LHA is involved in stress-induced anxiety-like behaviors?<br /> (7) Whether LHA neurons have direct projections to the RVM? If yes, what is its role in the SIA?

    3. Reviewer #2 (Public review):

      Shah et al. investigate the role of an understudied neural circuitry, specifically the dLS -> LHA -> RVM pathway, in mediating stress-induced analgesia. The authors use a combination of advanced techniques to provide convincing evidence for the involvement of this circuit in modulating pain under stress.

      The study begins by mapping the neural circuitry through a series of intersectional tracings. Following this, the authors use behavioral tests along with optogenetic and chemogenetic manipulations to confirm the pathway's role in promoting analgesia. Additionally, fiber photometry is employed to monitor the activity of each brain region in response to stress and pain.

      While the study is comprehensive and the findings are convincing, a key concern arises regarding the overarching hypothesis that restraint-induced stress promotes analgesia. A more straightforward interpretation could be that intense struggling, rather than stress itself, might drive the observed analgesic responses.

    4. Author response:

      Reviewer #1 (Public Review): 

      The manuscript entitled "A septo-hypothalamic-medullary circuit directs stress-induced analgesia" by Shah et al., showed that the dLS-to-LHA circuit is sufficient and necessary for stress-induced analgesia (SIA), which is mediated by the rostral ventromedial medulla (RVM) in a opioid-dependent manner. This study is interesting and important and the conclusions are largely supported by the data. I have a few concerns as follows:

      We thank the reviewer for finding our study “interesting”, “important”, and “conclusions are largely supported by data”.

      (1)  The present data show that activation of dLS neurons produces SIA, however, this manipulation is non-specific. It may be better to see the effect of specific manipulation of stress-activated c-Fos positive neurons in the dLS using a combination of the Tet-Off system and chemogenetic/optogenetic tools. 

      We agree with the reviewer that activating the stress-“trapped” neurons will be more specific way to induce SIA through septal activation, compared to the activation of entire dLS strategy pursued by us. In most likelihood, we expect to see a robust SIA if specifically stress responsive dLS neurons are observed. We are in the process of acquiring the genetic tools required for “Trapping” stress neurons and expect to be able to perform the experiments suggested by the reviewers in the coming months. 

      (2)  Depending on its duration, and intensity, stress can exert potent and bidirectional modulatory effects on pain, either reducing pain (SIA) or exacerbating it (stress-induced hyperalgesia, SIH). Is the circuit in the manuscript involved in SIH?

      As mentioned by the reviewer, it would be reasonable to suspect that the dLS neurons are involved in SIH. However, we believe that the experiments to test this hypothesis is outside the scope of this paper, since here we have focused on the circuit mechanisms for SIA. However, in the revised discussion section, we have included the possibility of dLS neurons driving SIH. 

      (3)  It is well-accepted that opioid and cannabinoid receptors participate in the SIA, and the evidence is especially strong for the RVM endocannabinoid system. Given this, why did the authors focus their study on the opioid system?

      We agree with the reviewer that dLS-mediated SIA may work through neural circuits centered on RVM expressing receptors for either or both opioids and endocannabinoids. We primarily focused on the opioidergic system in the RVM as decades of mechanistic work has revealed how the ON, OFF, and neutral neurons modulate pain through the endogenous opioids and even mediate SIA. In the revised discussion, we have included the possibility of involvement of both pain modulatory systems. 

      (4)  Does silencing of the dLS neurons affect stress-induced anxiety-like behaviors? Alternatively, what is the relationship between SIA and the level of stress-induced anxiety?

      We did not test if the silencing of dLS would affect stress-induced anxiety, as our focus was on the pain modulatory effects of dLS activation. The relationships between levels of SIA and stress-induced anxiety will be interesting to explore in future. We believe we would need better behavioral assays compared to the existing ones to quantitatively measure levels of stress-induced anxiety and SIA levels.

      (5)  Direct electrophysiological evidence should be provided to confirm the efficacy of the MP-CNO.

      We agree with the reviewer that ex-vivo electrophysiology experiments will substantiate the effectiveness of the MP-CNO. However, we do not have the expertise, or the instrumentation required to perform these experiments in our laboratory.

      (6)  Is the LHA a specific downstream target for SIA, and is the LHA involved in stressinduced anxiety-like behaviors?

      Several lines of evidence points to the fact that LHA neurons are involved in stressinduced anxiety. We have also shown that the dLS downstream neurons in the LHA are activated by acute restraint by fiber photometry recordings. Thus, we expect activation of the LHA neurons will cause stress-induced anxiety. However, we wanted to focus on the pain modulation aspect of the dLS-LHA-RVM circuitry.

      (7)  Do LHA neurons have direct projections to the RVM? If yes, what is its role in the SIA?

      Our anatomical studies using transsynaptic anterograde and retrograde viral strategies in the Figure 6 shows that the LHA neurons have direct projections to the RVM, and these neurons are sufficient in driving hyperalgesia, as well as necessary for SIA. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Shah et al. explore the function of an understudied neural circuitry from the dLS -> LHA -> RVM in mediating stress-induced analgesia. They initially establish this neural circuitry through a series of intersectional tracings. Subsequently, they conduct behavioral tests, coupled with optogenetic or chemogenetic manipulations, to confirm the involvement of this pathway in promoting analgesia. Additionally, fiber photometry experiments are employed to investigate the activity of each brain region in response to stress and pain. 

      Strengths: 

      Overall, the study is comprehensive, and the findings are compelling. 

      We appreciate the reviewer for finding our manuscript “comprehensive” and “compelling”.

      Weaknesses: 

      One noteworthy concern arises regarding the overarching hypothesis that restrainedinduced stress promotes analgesia. A more direct interpretation suggests that intense struggling, rather than stress per se, activates the dLS -> LHA -> RVM pathway that may drive analgesic responses. 

      We agree with the reviewer that our data can be interpreted as “intense struggling”, rather than the “acute stress” might have altered the pain thresholds in mice. However, we would like to point out that the restraint induced stress model that we have used has been long regarded as a standard for inducing stress. Moreover, we have demonstrated that dLS activation results into acute stress by measuring the blood corticosterone levels, and showed that dLS activations caused stress-induced anxiety through lightdark box tests.

      Reviewer #2 (Recommendations For The Authors): 

      Please find below my other comments for improvements. 

      Introduction: The authors claimed that "dLS neurons receive nociceptive inputs from the thalamus and somatosensory cortices." However, citations are missing.

      We have added the citations.

      Figure 1 B&C: Although this paper focuses on the dLS, it would be informative to also include vLS c-Fos images (maybe in a supplementary figure), given that these data appear to be already acquired. The inclusion of vLS data will provide critical information regarding potential specificity (or lack of) across LS subregions in stress responses.

      In the revised manuscript we have added the vLS c-Fos images as suggested by the reviewer. 

      Figure 1D: Quantification of Vgat vs. Vglut neurons is missing. It is unclear if the Vgat neurons are restricted to small clusters.

      We did not add the Vglut vs, Vgat quantification since from both of our experiments and publicly available data from the Allen Brain Atlas show that almost all of the neurons in the LS are gabaergic. We found very rare,0-2 Vglut2 expressing neurons per section in the the LS of the mouse brain.

      Figure 1G: The Y-axis label is missing. 

      We have added the axis in the revised manuscript.

      Figure 2: The authors claimed that dLS neurons are preferentially tuned to stress caused by physical restraint. However, it appears that these neurons are specifically tuned to intense struggle behavior (transient) rather than stress (prolonged).

      We agree with the reviewer that the SIA observed in mice with dLS activation, can be interpreted as the effect of transient struggle behavior rather than the prolonged stress. However, we would like to point out that the acute restraint for one hour is known to produce prolonged stress, and is backed up by increased blood coticosterone levels and stress-induced anxiety (Fig1-Fig Supplementary 1).

      Figure 4: The authors provided compelling evidence that dLS neurons synapse on LHA Vglut2 neurons. However, it is unclear if they exclusively target the Vglut2 neurons or also synapse on LHA Vgat neurons.

      We agree with the reviewer that even though the majority of the dLS downstream neurons in the LHA are glutamatergic, as now shown in the Fig. 4D, few neurons do not express Vglut and thus must be Gabaergic. 

      Figure 5D: It is unclear if the trace represents dLS or LHA calcium signal (in the main text, the authors claimed both).

      Now, we have mentioned the neurons on the LHA we have recorded from at the top of Figure 5C, D. 

      Figure 6 G&H: Presumably, ΔG-Rabies does not transmit across neurons due to the deletion of the glycoprotein (G) gene. Thus, it is unclear why dLS and LHA neurons express mCherry after injecting rabies into RVM.

      The aim of the rabies experiment was to test that the cells in the LHA that receive inputs from the dLS are the same ones that send projections downstream to the RVM. To this end, we used a monosynaptic rabies virus that has retrograde properties. Hence, when injected into the RVM, it was taken up by the terminals of the LHA neurons in the RVM and traveled to the cell bodies in the LHA. We injected the AAV1-Transsyn-Cre in the dLS, so only the cells downstream of the dLS in the LHA can express the Credependent glycoprotein (G) gene. Thus, the rabies-mCherry virus infected the LHA neurons downstream of dLS specifically, and jumped a synapse, to label the upstream dLS neurons.

      The authors claim that "RVMpost-LHA neurons may modulate nociceptive thresholds through their local synaptic connections within the RVM, recurrent connections with the PAG, or direct interactions with spinal cord neurons." It is unclear what the "local synaptic connections within the RVM" means. It is also unclear whether there is evidence of recurrent connections between the RVM and PAG.

      We meant by local connections as intrinsic connections within the RVM, as in some or few of the RVM neurons, post LHA might be interneurons and mediating SIA by modulating the ON or OFF cells. There are some anatomical evidence for the ascending inputs from RVM to the PAG and the we have now included the citation in the mentioned section of the manuscript.

    1. eLife Assessment

      This study uses C. elegans, a poikilothermic ("cold-blooded") animal, to investigate the interesting question of how cells and organisms adapt to prolonged exposure to cold temperature. The study employed ribosome profiling and RNAseq analyses and provides a useful inventory of genes changed in cold adapted nematodes. However, the overall conclusions that 1) translation is ongoing at a low rate and 2) IRE mediated transcriptional changes play a significant role in cold adaptation are incompletely supported by the evidence provided. The authors are encouraged to conduct additional bioinformatic analyses and rewrite the manuscript to more accurately reflect the evidence provided.

    2. Reviewer #1 (Public review):

      The manuscript by Engelfriet et.al. addresses an interesting question in animal physiology - how do animals adapt to cold. Using polysome profiling and puromycin labeling, the authors confirm that in C. elegans exposed to a cooling regimen, protein synthesis is decreased globally. They then use RNAseq and ribosome profiling to propose that this decrease is driven mainly by decreased transcription, while translation of most mRNAs continues in the cold at a slower rate. They also find many transcripts whose expression is increased in the cold, and suggest that transcription of some of the cold-induced genes reflects activation of the IRE-1/XBP-1 UPR pathway. The authors further suggest that activation of the UPR by cold is due to cold-induced protein misfolding and perturbations in lipids in the ER, and that UPR activation is beneficial for cold survival.

      The finding that a decrease in protein synthesis that is characteristic of cold exposure and hibernation is driven primarily by changes in transcription rather than translation is quite interesting and different from findings in other studies. It would be important to understand the reason for this difference. The findings that some of the cold-induced transcription in worms reflects XBP-1-dependent activity of IRE-1 is also new, while UPR activation by lipid perturbations both agrees with previous observations but also exposes differences. The differences highlight the need for better understanding of how different temperature exposures affect different lipids, as cold adaptation is widespread in nature, and cooling is often used in the clinical settings.

      However, some concerns with interpretations and technical issues make several major conclusions in this manuscript less rigorous, as explained in detail in comments below. In particular, the two major concerns I have: 1) the contradiction between the strong reduction of global translation, with puromycin incorporation gel showing no detectable protein synthesis in cold, and an apparently large fraction of transcripts whose abundance and translation in Fig. 2A are both strongly increased. 2) The fact that no transcripts were examined for dependance on IRE-1/XBP-1 for their induction by cold, except for one transcriptional reporter, and some weaknesses (see below) in data showing activation of IRE-1/XBP-1 pathway. The conclusion for induction of UPR by cold via specific activation of IRE-1/XBP-1 pathway, in my opinion, requires additional experiments.

      Major concerns:

      (1) Fig. 1B shows polysomes still present on day 1 of 4{degree sign}C exposure, but the gel in Fig. 1C suggests a complete lack of protein synthesis. Why? What is then the evidence that ribosomal footprints used in much of the paper as evidence of ongoing active translation are from actual translating rather than still bound to transcripts but stationary ribosomes, considering that cooling to 4{degree sign}C is often used to 'freeze' protein complexes and prevent separation of their subunits? The authors should explain whether ribosome profiling as a measure of active translation has been evaluated specifically at 4{degree sign}C, or test this experimentally. They should also provide some evidence (like Western blots) of increases in protein levels for at least some of the strongly cold-upregulated transcripts, like lips-11.

      As puromycin incorporation seems to be the one direct measure of global protein synthesis here, it conflicts with much of the translation data, especially considering that quite a large fraction of transcripts have increased both mRNA levels and ribosome footprints, and thus presumably increased translation at 4{degree sign}C, in Fig. 2A.

      Also, it is not clear how quantitation in Fig. 1C relates to the gel shown, the quantitation seems to indicate about 50-60% reduction of the signal, while the gel shows no discernable signal.

      (2) It is striking that plips-11::GFP reporter is induced in day 1 of 4{degree sign}C exposure, apparently to the extent that is similar to its induction by a large dose of tunicamycin (Fig. 3 supplement), but the three IRE-1 dependent UPR transcripts from Shen 2005 list were not induced at all on day 1(Fig. 4 supplement). Moreover, the accumulation of the misfolded CPL-1 reporter, that was interpreted as evidence that misfolding may be triggering UPR at 4{degree sign}C, was only observed on day 1, when the induction of the three IRE-1 targets is absent, but not on day 3, when it is stronger. How does this agree with the conclusion of UPR activation by cold via IRE-1/XBP-1 pathway? It is true that the authors do note very little overlap between IRE-1/XBP-1-dependent genes induced by different stress conditions, but for most of this paper, they draw parallels between tunicamycin-induced and cold-induced IRE-1/XBP-1 activation.

      The conclusion that "the transcription of some cold-induced genes reflects the activation of unfolded protein response (UPR)..." is based on analysis of only one gene, lips-11. No other genes were examined for IRE-1 dependence of their induction by cold, neither the other 8 genes that are common between the cold-induced genes here and the ER stress/IRE-1-induced in Shen 2005 (Venn diagram in Figure 7 supplement), nor the hsp-4 reporter. What is the evidence that lips-11 is not the only gene whose induction by cold in this paper's dataset depends on IRE-1? This is a major weakness and needs to be addressed.

      Furthermore, whether induction by cold of lips-11 itself is due to IRE1 activation was not tested, only a partial decrease of reporter fluorescence by ire-1 RNAi is shown. A quantitative measure of the change of lips-11 transcript in ire-1 and xbp-1 mutants is needed to establish if it depends on IRE-1/XBP-1 pathway.

      The authors could provide more information and the additional data for the transcripts upregulated by both ER stress and cold, including the endogenous lips-11 and hsp-4 transcripts: their identity, fold induction by both cold and ER stress, how their induction is ranked in the corresponding datasets (all of these are from existing data), and do they depend on IRE-1/XBP-1 for induction by cold? Without these additional data, and considering that the authors did not directly measure the splicing of xbp-1 transcript (see comment for Fig. 3 below), the conclusion that cold induces UPR by specific activation of IRE-1/XBP-1 pathway is premature.

      There are also technical issues that are making it difficult to interpret some of the results, and missing controls that decrease the rigor of conclusions:

      (1) For RNAseq and ribosome occupancy, were the 20{degree sign}C day 1 adult animals collected at the same time as the other set was moved to 4{degree sign}C, or were they additionally grown at 20{degree sign}C for the same length of time as the 4{degree sign}C incubations, which would make them day 2 adults or older at the time of analysis? This information is only given for SUnSET: "animals were cultivated for 1 or 3 additional days at 4{degree sign}C or 20{degree sign}C". This could be a major concern in interpreting translation data: First, the inducibility of both UPR and HSR in worms is lost at exactly this transition, from day 1 to day 2 or 3 adults, depending on the reporting lab (for example Taylor and Dillin 2013, Labbadia and Morimoto, 2015, De-Souza et al 2022). How do authors account for this? Would results with reporter induction, or induction of IRE-1 target genes in Fig. 4, change if day 1 adults were used for 20{degree sign}C?

      Second, if animals at the time of shift to 4{degree sign}C were only beginning their reproduction, they will presumably not develop further during hibernation, while an additional day at 20{degree sign}C will bring them to the full reproductive capacity. Did 4{degree sign}C and 20{degree sign}C animals used for RNAseq and ribosome occupancy have similar numbers of embryos, and were the embryos at similar stages? If embryos were retained in one condition vs the other, how much would they contribute in terms of transcripts, and do the authors expect the same adaptive programs operating in embryos and in the adults?

      (2) Second, no population density is given for most of the experiments, despite the known strong effects of crowding (high pheromone) on C. elegans growth. From the only two specifics that are given, it seems that very different population sizes were used: for example, 150 L1s were used in survival assay, while 12,000 L1s in SUnSET. Have the authors compared results they got at high population densities with what would happen when animals are grown in uncrowded plates? At least a baseline comparison in the beginning should have been done.

      (3) Fig. 3: it is unclear why the accepted and well characterized quantitative measure of IRE1 activation, the splicing of xbp-1transcript, is not determined directly by RT-PCR. The fluorescent XBP-1spliced reporter, to my knowledge, has not been tested for its quantitative nature and thus its use here is insufficient.

      Furthermore, the image of this fluorescent reporter in Fig. 3b shows only one anterior-most row of cells of intestine, and quantitation was done with 2 to 5 nuclei per animal, while lips-11 is induced in entire intestine. Was there spliced XBP-1 in the rest of the intestinal nuclei? Could the authors show/quantify the entire animal (20 intestinal cells) rather than one or two rows of cells?

      (4) The differences in the outcomes from this study and the previous one (Dudkevich 2022) that used 15{degree sign}C to 2{degree sign}C cooling approach are puzzling, as they would suggest two quite different IRE-1 dependent programs of cold tolerance. It would be good if authors commented on overlapping/non-overlapping genes, and provided their thoughts on the origin of these differences considering the small difference in temperatures. Second, have the authors performed a control where they reproduced the rescue by FA supplementation of poor survival of ire-1 mutants after the 15{degree sign}C to 2{degree sign}C shift?

      Without this or another positive control, and without measuring change in lipid composition in their own experiments, it is unclear whether the different outcomes with respect to FAs are due to a real difference in adaptive programs at these temperatures, or to failure in supplementation?

      (5) Have the authors tested whether and by how much ire-1(ok799) mutation shortens the lifespan at 20{degree sign}C? This needs to be done before the defect in survival of ire-1 mutants in Fig. 7a can be interpreted.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates cold induced states in C. elegans, using polysome profiling and RNA seq to identify genes that are differentially regulated and concluding that cold-specific gene regulation occurs at the transcriptional level. This study also includes analysis of one gene from the differentially regulated set, lips-11 (a lipase), and finds that it is regulated in response to a specific set of ER stress factors.

      Strengths:

      (1) Understanding how environmental conditions are linked to stress pathways is generally interesting.

      (2) The study used well-established genetic tools to analyze ER stress pathways.

      Weaknesses:

      (1) The conclusions regarding a general transcriptional response are based on one gene, lips-11, which does not affect survival in response to cold. We would suggest altering the title, to replace "Reprograming gene expression: with" Regulation of the lipase lips-11".

      (2) There is no gene ontology with the gene expression data.

      (3) Definitive conclusions regarding transcription vs translational effects would require use of blockers such as alpha amanatin or cyclohexamide.

      (4) Conclusions regarding the role of lipids are based on supplementation with oleic acid or choline, yet there is no lipid analysis of the cold animals, or after lips-1 knockdown. Although choline is important for PC production, adding choline in normal PC could have many other metabolic impacts and doesn't necessarily implicate PC with out lipidomic or genetic evidence.

    4. Reviewer #3 (Public review):

      Summary:

      The authors sought to understand the molecular mechanisms that cells use to survive cold temperatures by studying gene expression regulation in response to cold in C. elegans. They determined whether gene expression changes during cold adaptation occur primarily at the transcriptional level and identified specific pathways, such as the unfolded protein response pathway, that are activated to possibly promote survival under cold conditions.

      Strengths:

      Effective use of bulk RNA sequencing (RNA-seq) to measure transcript abundance and ribosome profiling (ribo-seq) to assess translation rates, providing a comprehensive view of gene expression regulation during cold adaptation. This combined approach allows for correlation between mRNA levels and their translation, thereby offering evidence for the authors' conclusion that transcriptional regulation is the primary mechanism of cold-specific gene expression changes.

      Weaknesses:

      The study has several weaknesses: it provides limited novel insights into pathways mediating transcriptional regulation of cold-inducible genes, as IRE-1 and XBP-1 are already well-known responders to endoplasmic reticulum stress, including that induced by cold. Additionally, the weak cold sensitivity phenotype observed in ire-1 mutants casts doubt on the pathway's key role in cold adaptation. The study also overlooks previous research (e.g. PMID: 27540856) that links IRE-1 to SKN-1, another major stress-responsive pathway, potentially missing important interactions and mechanisms involved in cold adaptation.

    1. eLife Assessment

      This important study develops a high throughput version of expansion microscopy that can be performed in 96-well plates. The engineered technology is convincing and compatible with standard microplates and automated microscopes and thus will be of broad interest. The application to hiPCS-derived cardiomyocytes treated with doxorubicin provides a solid proof-of-concept demonstrating the potential for high-throughput analysis.

    2. Reviewer #1 (Public review):

      Summary

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies.

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful and the data generally support the conclusions.

      Strengths

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions.

      Weaknesses

      (1) It is still unclear to me whether or not cells that do not expand remain in the well given the response to point 1. The authors say the cells are digested and washed away but then say that there is a remaining signal from the unexpanded DNA in some cases. I believe this is still a concern that potential users of the protocol should be aware of.

      Editor note: this comment has been addressed in the latest version.

      (2) Regarding the response to point 9, I think this information should be included in the manuscript, possibly in the methods. It is important for others to have a sense of how long imaging may take if they were to adopt this method.

      Editor note: this comment has been addressed in the latest version.

    3. Reviewer #2 (Public review):

      Summary:

      In the present work, the authors present an engineering solution to sample preparation in 96-well plates for high-throughput super resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit expansion of the gel. They thus engineered a device that can spot a small droplet of hydrogel solution and keep it in place as it polymerises. It occupies only a small portion space at the center of each well, the gel can expand into all directions and imaging and staining can proceed by liquid handling robots and an automated microscope.

      Strengths:

      In contrast to Reference 8, the authors system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high throughput exM and high throughout super resolution microscopy, which is a timely and important goal.

      Addition upon revision:

      The authors addressed this reviewer's suggestions.

    4. Reviewer #3 (Public review):

      Summary:

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include: 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand toroidal gel within each well.

      Addition upon revision:

      Overall, the authors have adequately addressed most of the concerns raised. There are a few minor issues that require attention.

      Minor comments:

      Figure S10: There appears to be a discrepancy in the panel labeling. The current labels are E-H, but it is unclear whether panels A-D exist. Also, this reviewer thought that panels G and H would benefit from statistical testing to strengthen the conclusions. As a general rule for scientific graph presentation, the y-axis of all graphs should start at zero unless there is a compelling reason not to do so.

      Editor note: this comment has been addressed in the latest version.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies.

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful and the data generally support the conclusions.

      Strengths

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions.

      Weaknesses

      (1) It is still unclear to me whether or not cells that do not expand remain in the well given the response to point 1. The authors say the cells are digested and washed away but then say that there is a remaining signal from the unexpanded DNA in some cases. I believe this is still a concern that potential users of the protocol should be aware of.

      Although ProteinaseK digestion removes most of the unexpanded cells, DNA can sometimes persist. As such, we occasionally observe Hoechst signal underneath cells. The residual DNA is easily differentiated from nuclear Hoechst signal and does not confound interpretation of results. We have added a new supplementary figure that further clarifies this point.

      (2) Regarding the response to point 9, I think this information should be included in the manuscript, possibly in the methods. It is important for others to have a sense of how long imaging may take if they were to adopt this method.

      We have added detailed information to the methods section to address this point as shown below.  In general, we image HiExM samples on the Opera Phenix at 63x with the following parameters: 100% laser power for all channels; 200 ms exposure for Hoechst, 500-1000+ ms exposure for immunostained channels depending on the strength of the stain and the laser; 60 optical sections with 1 micron spacing; and 4-20 fields of view per well depending on the cell density and sample size requirements. Therefore, imaging one full 96-well plate (60 wells total as we avoid the outer wells) takes anywhere from 3 hr to 64 hr depending on the combination of parameters used.

      Reviewer #2 (Public review):

      Summary:

      In the present work, the authors present an engineering solution to sample preparation in 96-well plates for high-throughput super resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit expansion of the gel. They thus engineered a device that can spot a small droplet of hydrogel solution and keep it in place as it polymerises. It occupies only a small portion space at the center of each well, the gel can expand into all directions and imaging and staining can proceed by liquid handling robots and an automated microscope.

      Strengths:

      In contrast to Reference 8, the authors system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high throughput exM and high throughout super resolution microscopy, which is a timely and important goal.

      Addition upon revision:

      The authors addressed this reviewer's suggestions.

      Reviewer #3 (Public review):

      Summary:

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include: 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand toroidal gel within each well.

      Addition upon revision:

      Overall, the authors have adequately addressed most of the concerns raised. There are a few minor issues that require attention.

      Minor comments:

      Figure S10: There appears to be a discrepancy in the panel labeling. The current labels are EH, but it is unclear whether panels A-D exist. Also, this reviewer thought that panels G and H would benefit from statistical testing to strengthen the conclusions. As a general rule for scientific graph presentation, the y-axis of all graphs should start at zero unless there is a compelling reason not to do so.

      We have revised Figure S10 to address your comments.

    1. Reviewer #1 (Public review):

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins. By only focusing on coenzymes, the authors may have overestimated their importance. What about other small molecules that existed in the prebiotic soup? Do they also prefer such ancient amino acids? if so, this might reflect the interaction propensity of specific amino acids rather than some possible role in very ancient proteins. Or it might diminish the conjectured importance of coenzymes. The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented. This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

    2. Reviewer #2 (Public review):

      This study advances the model that the first canonical amino acids to emerge in life bound the earliest cofactors and led to the first proteins. The focus is on organic/organometallic cofactors, building on previous work on metals - ie. those in the groups of Bromberg, Dupont and others as well cited in the manuscript. Studies of this type are limited both by data availability and confounding chemical effects that are exacerbated by the timescale of evolutionary inference tackled here. However, the analysis provides a solid addition to the field and complements existing metal-focused studies as well as those Longo, Russell and others (also well cited).

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

      Strengths:

      (1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

      (2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

      (3) The paper is clearly written.

      We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others. 

      Weaknesses:

      (1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

      We would like to point out that there was no distinction between proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes, and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out, but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

      “These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

      “While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

      “This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/)  showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically but showed that ancient states often show more favorable properties than modern proteins.

      (2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

      We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

      Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

      We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.  

      Reviewer #2 (Public Review):

      I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

      As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

      Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine.

      There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

      Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

      We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. 

      As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.  

      In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in Post-LUCA than in LUCA, vs. Ancient (Supplementary table 5A) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Supplementary table 5b). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

      The following text (and the additional data) was included in the revised manuscript version:

      “To explore the contribution of individual amino acids to this effect, fractional difference (FD) for early vs. late amino acids among the Ancient, LUCA, and Post-LUCA coenzyme binding was calculated (Supplementary Table 5). The mean FD revealed a similar trend to the amino acid composition analysis (Fig. 3). The amino acids most enriched in LUCA vs. Post-LUCA are Gly, Ser, and Leu (FD of 4.4, 4.3, and 4.1 respectively), while the most depleted include Phe, Arg, and His (FD of -11, -4.2, and -3.2) (Supplementary Table 5B).”

      Point 2 - The correlation is dominated by phosphate.

      In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleft-alpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

      Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

      Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript. 

      Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Supplementary table 6A and 6B show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

      The following text was included in the revised manuscript version:

      “Moreover, we investigated whether the observed trend in amino acid occurrence at the binding sites was dominated by the presence of phosphate groups, which are common in many ancient cofactors except for SAM, Tetrahydrofolic acid, Biopterin, and Heme. An additional analysis therefore excluded all phosphate-containing coenzymes indicating that while the trend is less pronounced, it remains even in the absence of phosphate groups (Supplementary Table 6).”

      In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

      I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

      We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study.   We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.  

      “While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone.

      Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphate-containing coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

      Recommendations for the authors:

      (1) By only focusing on coenzymes, the authors may have overestimated their importance. What about other small molecules that existed in the prebiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than some possible role in very ancient proteins. Or it might diminish the conjectured importance of coenzymes.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (2) The authors should analyze whether the interactions are with similar types of amino acids in ancient versus early proteins.

      While we appreciate the interesting suggestion, we would like to clarify that we did not aim to elucidate the differences between early and late protein folds - we agree that this might add an interesting perspective to our work, but we feel that it is well beyond the scope of our current study.

      (3) The authors might also wish to do sequence alignments to the structures in early versus late evolving proteins to see how general this pattern of residue usage is beyond the limited set of proteins found in the PDB.

      This is an interesting suggestion but similar to the previous recommendation, it is not within the scope of this study where no distinction between early and late evolving proteins has been made.  

      There has been a number of attempts to classify the folds as shared among Bacteria, Archea and Eukaryota or specific to  one or two of these groups of organisms (https://link.springer.com/article/10.1007/s00239-023-10136-xhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9541633/) - this does not however compare easily with our time scales - where ancient ligands occur well before the last common ancestor.

      We also agree  the set of sequences present in the PDB is biased, but perhaps it is less biased than we have thought. The recent fantastic work https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2)  from Nicola Bordin and his colleagues from Orengo group attempted to classify over 200 milion structures in Alphafold database in so called Encyclopedia of Domains and they found out that nearly 80% of detected domains can be assigned to already known superfamilies in CATH (https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2).

      (4) The authors might wish to consider the results in Skolnick, H. Zhou, and M. Gao. On the possible origin of protein homochirality, structure, and biochemical function. PNAS 2019: 116(52): 26571-26579.

      Based on the editorial recommendation, the following sentence was added in the discussion:

      “It has been implied by computer simulations that coenzymes could bind to proteins with similar propensity even before the onset of protein homochirality, despite lower structural stability and secondary structure content in heterochiral polypeptides (Skolnick et al., 2019).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In their manuscript entitled: "Is tumor mutational burden predictive of response to immunotherapy?", Gurjao and colleagues discuss the use of tumor mutational burden (TMB) as a predictive biomarker for cancer patients to respond to immune checkpoint blockage (ICB). By analyzing a large cohort of 882 patient samples across different tumor types they find either little or no association of TMB to the response of ICB. In addition, they showed that finding the optimal cutoff for patient stratification lead to a severe multiple testing problem. By rigorously addressing this multiple testing problem only non-small cell lung cancer out of 10 cancer types showed a statistically significant association of TMB and response to ICB. Nevertheless, it is clearly shown that in any case the rate of misclassification is too high that TMB alone would qualify as a clinically suitable biomarker for ICB response. Finally, the authors demonstrate with a simple mathematical model that only a few strong immunogenic mutations would be sufficient for an ICB response, thereby showing that also patients with a low TMB score could benefit from immunotherapy. The manuscript is clearly written, the results are well presented and the applied methods are state-of-the-art.

      We would like to thank the reviewer for their thoughtful suggestions and efforts towards improving our manuscript. We address below the reviewer’s recommendations.

      Reviewer #1 (Recommendations For The Authors):

      (1) The method used for mutation call can also influence the TMB score. Mutation data was downloaded from public databases and not re-called for this study, a potential caller bias could be present. What was the calling strategy of the used data sets? For the present study, I don't think that this is crucial because different callers or post-call processing would be used at different sites to determine TMB. I think it should the mutation calling bias should also be discussed in the manuscript as another shortcoming for TMB as a biomarker for ICB response.

      We thank the reviewer for this comment. Mutational data was not aggregated across studies and caller bias would thus not have any impact on the results of this manuscript. In addition, we further clarified the role of mutation calling bias in the Discussions section.

      “Although attractive and scalable, TMB does not consider the effect of specific mutations (missense, frameshift etc), their presentation and clonality (19), nor the state of the tumour, its microenvironment, and interactions with the immune system that can be integrated into potentially better predictors of response to ICB (43, 44). In addition, another major limitation of TMB is the lack of standardized measures. This includes the lack of standard sequencing methods to assess TMB: TMB can be measured from Whole-Exome sequencing, Whole-Genome sequencing, targeted panel and even RNA sequencing. This also includes biases introduced by using different mutation calling pipelines resulting in different TMB, sequencing depth and different characteristics of the samples (e.g. low purity samples typically yield lower TMB).”

      (2) In their mathematical model of neoantigens and immunogenicity it is assumed that the probability of a mutation to be immunogenic is constant for all mutations. In reality this is certainly not satisfied. However, the central conclusion from the model still holds. I think that this is important to discuss in the manuscript.

      We thank the reviewer for this suggestion and now consider the case where each mutation has its own probability p(i) of being immunogenic.

      “Our model shows that achieving about constant 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} for 𝑁 > 10 − 20 mutations, requires and . The same argument holds when each mutation has its  own probability to be immunogenic 𝑝(𝑖), then , where is the mean probability of a mutation to be immunogenic. Thus only the average probability of a mutation to be immunogenic matters. In summary, we find that the model agrees with clinical data if individual non-synonymous mutations have, on average, 𝑝~10 − 20% chance for triggering an immune response.”

      (3) In the mathematical formula on page 8, C_N^k is the binomial coefficient. This should be stated or written out.

      Thank you for pointing this out. Corrected.

      “Due to immunodominance, only a few 𝑘crit immunogenic mutations are sufficient to elicit a full k𝑐𝑟𝑖𝑡 immune response. Hence, the probability for a cancer with 𝑁 (=TMB) mutations to elicit an immune response is then the probability of having 𝑘 or more immunogenic mutations among :

      which is the CDF of a binomial distribution.”

      (4) The mathematical model provides an explanation that tumors with a low TMB can also respond on ICB. It cannot explain tumors with high TMB lacking ICB response. An explanation of this phenomenon is discussed in the paper but I think also the impact of the tumor immune microenvironment should be mentioned here.

      As we explained in the presentation of the model, even immunogenic tumors elicit response to ICB with some probability. In the revision we write:

      “𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} = 𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒|𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} · 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒}, where 𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒|𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} is the probability of clinical response, given that cancer elicits an immune response which is complex and depends on many factors including tumor immune microenvironment. Yet the prerequisite for the clinical response is the immune response 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} that we focus on.”

      Reviewer #2 (Public Review):

      The manuscript points out that TMB cut-offs are not strong predictors of response to immunotherapy or overall survival. By randomly shuffling TMB values within cohorts to simulate a null distribution of log-rank test p-values, they show that under correction, the statistical significance of previously reported TMB cut-offs for predicting outcomes is questionable.

      We would like to thank the reviewer for their thoughtful suggestions and efforts towards improving our manuscript.

      There is a clinical need for a better prediction of treatment response than TMB alone can provide. However, no part of the analysis challenges the validity of the well-known pan-cancer correlation between TMB and immunotherapy response.

      We address the pan-cancer correlation in the supplemental text and Figure S3. We realized the supplemental text was missing in eLife submission and included in the bioRxiv only. We apologize for this oversight. In particular, we show that the “well-known pan-cancer correlation” is largely based on a few outlier cancer subtypes - MSI colorectal cancers and uveal/ ocular melanomas. We show that when we remove these cancer types from the pan-cancer dataset, the correlation becomes non-significant for the remaining 15 cancer types.

      The failure to detect significant TMB cut-offs may be due to insufficient power, as the examined cohorts have relatively low sample sizes. A power analysis would be informative of what cohort sizes are needed to detect small to modest effects of TMB on immune response.

      Since we see no effect, we cannot perform a power analysis. Moreover, increasing cohort sizes cannot increase the effect -- dramatic misclassification of responders (the fraction of responders below the treatment cutoff) would remain the same, making TMB unsuitable for clinical decision-making.

      The manuscript provides a simple model of immunogenicity that is tailored to be consistent with a claimed lack of relationship between TMB and response to immunotherapy. Under the model, if each mutation that a tumor has acquired has a relatively high probability of being immunogenic (~10%, they suggest), and if 1-2 immunogenic mutations is enough to induce an immune response, then most tumors produce an immune response, and TMB and response should be uncorrelated except in very low-TMB tumors.

      Contrary to reviewer’s suggestion, our modeling is not tailored to be consistent with the lack of association between TMB and response. On the contrary, we found the model has two regimes: the first regime (where p<<1) in which higher TMB leads to a higher probability of response, which doesn’t agree with the data , and the second regime (p~0.1) in which cancers with TMB>10-20 are immunogenic, consistent with the clinical data.

      We further expanded on these key points in the Results:

      “The model shows two different behaviors. If individual mutations are unlikely to be immunogenic (𝑝 ≪ 1) , e.g. due to a low probability of being presented, the probability of response increases gradually with TMB (Figure 5B). The neoantigen theory generally expects such gradual increase in immunogenicity of cancer with TMB. Yet, available data (Figure 2) don’t show such a trend.

      On the contrary, if mutations are more likely to be immunogenic 𝑝~0. 1, the probability of response quickly saturates (Figure 5C), making such tumors respond to ICB irrespective of TMB, as we observed in clinical data.”

      We also expanded on these key points in the Introduction:

      “We develop a simple model that is based on the neoantigen theory and find that it has two regimes. In one regime, the probability of response increases gradually with TMB, as commonly believed. Yet in the other, the probability of response saturates after a few mutations, making a chance to respond independent of TMB. Our analysis of the clinical data is consistent with the latter regime. Thus our model shows that the neoantigen theory is fully consistent with the lack of association between TMB and response.”

      The question then becomes whether the response is sufficient to wipe out tumor cells in conjunction with immunotherapy, which is essentially the same question of predicting response that motivated the original analysis. While TMB alone is not an excellent predictor of treatment response, the pan-cancer correlation between TMB and response/survival is highly significant, so the model's only independent prediction is wrong.

      Our study indicates that TMB is a very poor predictor (writing that it’s “not an excellent predictor of treatment response” is understatement). Moreover we show that a widely believed “pan-cancer correlation” is shaky as well (Supplemental text and Figure S3). So we don’t see any contradictions between the model and the data.

      Additionally, experiments to predict and validate neoepitopes suggest that a much smaller fraction of nonsynonymous mutations produce immune responses1,2.

      We agree with the reviewer. That’s exactly what the model suggests.

      A key idea that is overlooked in this manuscript is that of survivorship bias: self-evidently, none of the mutations found at the time of sequencing have been immunogenic enough to provoke a response capable of eliminating the tumor. While the authors suggest that immunoediting "is inefficient, allowing tumors to accumulate a high TMB," the alternative explanation fits the neoepitope literature better: most mutations that reach high allele frequency in tumor cells are not immunogenic in typical (or patient-specific) tumor environments. Of course, immunotherapies sometimes succeed in overcoming the evolved immune evasion of tumors. Higher-TMB tumors are likely to continue to have higher mutation rates after sequencing; increased generation of new immunogenic mutations may partially explain their modestly improved responses to therapy.

      We disagree with reviewers' assertion that survivorship bias could explain observed phenomena. If immunogenic mutations that arise during cancer development were eliminated (by purifying selection, i.e. reduced fitness or cellular death) then observed mutations would carry noticeable signatures of purifying selection. On the contrary, cancer genomic data shows incredibly weak signals of purifying selection on non-synonymous mutations (Weghorn and Sunyaev, Nature Genetics 2017), and observed passenger mutations are practically indistinguishable from random in their effect on proteins (McFarland et al PNAS 2013).

      We do agree with the statement that “most mutations … in tumor cells are not immunogenic”. In fact that’s exactly what our model predicts: (1-p)~90% of mutations in the model are non-immunogenic, while remaining p~10% being sufficient to trigger an immune response. We clarify this in the text of the paper: “On the contrary, if mutations are more likely to be immunogenic 𝑝~0. 1, the probability of response quickly saturates (Figure 5C), making such tumors respond to ICB irrespective of TMB, as we observed in clinical data. ”

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Defining TMB as "number of non-synonymous mutations": while TMB is not consistently defined throughout the literature, it is usually given as a rate rather than a total count, and sometimes synonymous mutations are included. Consider adopting the definition used by the TMB Harmonization Project: "number of somatic mutations per megabase of interrogated genomic sequence.3"

      We thank the reviewer for their comment,

      Be more specific about your findings, so that abstract readers can get some understanding of your proposed explanation for the "immunogenicity of neoantigens and the lack of association between TMB and response."

      We thank the reviewer for their comment. We modified the abstract to explain that the theory we developed expands the neoantigen theory yet can be consistent with the observed lack of association between TMB and response:

      "Second, we develop a model that expands the neoantigen theory and can be consistent with both immunogenicity of neoantigens and the lack of association between TMB and response. Our analysis shows that the use of TMB in clinical practice is not supported by available data and can deprive patients of treatment to which they are likely to respond.”

      Introduction

      Again, consider using a more standard definition of TMB.

      We thank the reviewer for their comment. Our study did not seek to harmonize TMB across the datasets and we thus used the total number of mutations rather than the mutational rate often used for comparison across different datasets.

      Expand the introduction to provide a preview of the purpose and direction of your analysis. The current draft reveals only that the analysis will relate to TMB.

      We expanded the introduction providing the motivation, the approach, and the summary of main findings.

      “Using a biomarker to stratify and prioritize patients for treatment runs a risk of depriving patients who have a chance to respond to a life-saving treatment. High variability of response makes relying on a predictor particularly risky. Hence, we revisit original data that were used to establish correlation between TMB and response. We tested TMB as a predictor of both binary responder/non-responder labels from original clinical studies, as well as continuous survival data. We also investigated whether a TMB threshold could distinguish patients with high and low survival after multiple hypothesis testing. We find that no TMB threshold performs better on the clinical data than on randomized ones.

      We further show that irrespective of the strategy to choose the threshold, even if we were to employ the optimal TMB cutoff, it would still lead to about 25% of responders falling below the treatment prioritization threshold. In addition, we re-examine the pan-cancer association of TMB with response rate to ICB.

      “Finally we revisit the neoantigen theory that was the rationale for using TMB as a predictor of response to immunotherapy. The theory stipulates that non-synonymous mutations can lead to the production of unique antigens (_neo_antigens) that are recognized by the immune system as foreign, triggering the immune response to cancer. The theory further assumes that the more mutations a cancer has, the more likely it triggers the immune system, and the more likely it will benefit from immunotherapy. We develop a simple model that is based on the neoantigen theory and find that it has two regimes. In one regime, the probability of response increases gradually with TMB, as commonly believed. Yet in the other, the probability of response saturates after a few mutations, making a chance to respond independent of TMB. Our analysis of the clinical data is consistent with the latter regime. Thus our model shows that the neoantigen theory is fully consistent with the lack of association between TMB and response.”

      Section: Is TMB associated with response after treatment?

      The claim that after excluding melanoma and some colorectal cancers, there is no relationship between TMB and response rates in pan-cancer studies cites references 12 and 14. In reference 12 (Yarchoan et al.), it is clear from glancing at their Figure 1 that a pan-cancer correlation between TMB and response would remain with these cancer types excluded. This discrepancy requires explanation. "Supplementary text" is cited for this claim, but it was not included in the file that I received.

      We address the pan-cancer correlation in the supplemental text and Figure S3. While the figure was available, we realized the supplemental text was missing in eLife submission. We apologize for this oversight.

      Plots of survival and TMB do not show "visible correlation": Please strengthen this claim with an appropriate statistical test.

      We expand the figure caption to explain the following:

      “Plots of progression-free survival and TMB for melanoma and lung cancer ICB cohorts show the lack of correlation or of an obvious TMB cutoff. Computing a simple correlation for survival and censored data cannot correctly represent the dependence since patients who are alive live longer than the reported survival, and limiting correlation to patients who are dead would bias the analysis. Thus other survival statistics are used through the paper.”

      Section: Model reconciles neoantigen theory and data

      Page 8: In the probability formula, the C term is not defined. My guess is that it means choose(N, k).

      Please clarify.

      Thank you for pointing this out. Corrected using more conventional notation.

      which is the CDF of a binomial distribution.

      Page 8: Assuming the above, P(immune response) = P(X >= k_crit); where X~Bin(N, p). The formula should be explicitly introduced in terms of the CDF of the binomial distribution to prevent readers from thinking the wheel is being re-invented.

      We thank the reviewer for pointing this out, we modified the equation in the text to make it easier to see this point (see above). We refrain from going further since the CDF of a binomial distribution doesn’t have a closed form and can only be written as the regularized incomplete beta function.

      Page 9: Missing word in "allowing cancers with as little as mutations to be"

      We thank the reviewer for pointing this out, we modified the text accordingly.

      See comments in public review. In brief, I think a convincing case is made regarding the significance of TMB cut-offs as predictors of survival within cancer types, but frankly this elementary model is not compelling.

      Section: Materials and Methods

      In the manuscript, it is stated that TMB is accepted as reported by data sources. Since most of the comparisons in the manuscript are within-data-source, that is acceptable. However, it should be ensured that TMB measurements are comparable between samples within each source. For example, when TMB is reported as a total mutation count, it can be verified that all samples have the same coverage, or measurement can be converted to mutations per megabase of coverage. In the same vein, if this manuscript's definition of TMB only includes nonsynomous mutations, it should be confirmed that the TMB reported by data sources excludes synonymous mutations.

      We thank the reviewer for their comment. We leverage total TMB as reported in the original studies claiming an association between TMB and response/ survival.

      Figure S2: Instead of writing "the Youden index associated cutoffs is also plotted," it can be stated that the asterisk represents the Youden index cutoff, or a legend can be added that provides this information.

      We thank the reviewer for pointing this out, we modified the text accordingly.

    2. eLife Assessment

      This useful study examines relationships between tumor mutational burden and the response to immunotherapy, using new data sets along with publicly available data sets. The authors conclude that tumor mutational burden cut-offs are unreliable proxies for predicting the response to therapy, underpinned by solid evidence, but with several caveats and assumptions that leave the central question subject to further inquiry. In summary, this is an interesting study that adds to a growing body of work investigating the particular conditions governing the effectiveness of immunotherapy.

    3. Reviewer #2 (Public review):

      The manuscript points out that TMB cut-offs are not strong predictors of response to immunotherapy or overall survival. By randomly shuffling TMB values within cohorts to simulate a null distribution of log-rank test p-values, they show that under correction, the statistical significance of previously reported TMB cut-offs for predicting outcomes is questionable. There is a clinical need for a better prediction of treatment response than TMB alone can provide. However, the analysis does not convincingly refute the validity of the well-known pan-cancer correlation between TMB and immunotherapy response. (In a supplemental analysis, the authors attempt to demonstrate a lack of correlation by specifically removing the most supportive cancer types from a pan-cancer correlation test.) The failure to detect significant TMB cut-offs may be due to insufficient power, as the examined cohorts have relatively low sample sizes. A power analysis would be informative of what cohort sizes are needed to detect small to modest effects of TMB on immune response.

      The manuscript provides a simple model of immunogenicity that is tailored to be consistent with a claimed lack of relationship between TMB and response to immunotherapy. Under the model, if each mutation that a tumor has acquired has a relatively high probability of being immunogenic (~10%, they suggest), and if 1-2 immunogenic mutations is enough to induce an immune response, then most tumors produce an immune response, and TMB and response should be uncorrelated except in very low-TMB tumors. The question then becomes whether the response is sufficient to wipe out tumor cells in conjunction with immunotherapy, which is essentially the same question of predicting response that motivated the original analysis. While TMB alone is not an excellent predictor of treatment response, the pan-cancer correlation between TMB and response/survival is highly significant, so the model's only independent prediction is wrong. Additionally, experiments to predict and validate neoepitopes suggest that a much smaller fraction of nonsynonymous mutations produce immune responses (1,2).

      A key idea that is overlooked in this manuscript is that of survivorship bias: self-evidently, none of the mutations found at the time of sequencing have been immunogenic enough to provoke a response capable of eliminating the tumor. While the authors suggest that immunoediting "is inefficient, allowing tumors to accumulate a high TMB," the alternative explanation fits the neoepitope literature better: most mutations that reach high allele frequency in tumor cells are not immunogenic in typical (or patient-specific) tumor environments. Of course, immunotherapies sometimes succeed in overcoming the evolved immune evasion of tumors. Higher-TMB tumors are likely to continue to have higher mutation rates after sequencing; increased generation of new immunogenic mutations may partially explain their modestly improved responses to therapy.

      References:<br /> (1) Wells, D. K. et al. Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction. Cell 183, 818-834.e13 (2020).<br /> (2) Yadav, M. et al. Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing. Nature 515, 572-576 (2014).

    1. eLife assessment

      This work presents a valuable new approach for self-supervised segmentation for fluorescence microscopy data, which could eliminate time-consuming data labeling and speed up quantitative analysis. The experimental evidence supplied is currently incomplete as the comparison with other methods is only done on a single dataset, lacks common metrics, and could not be easily reproduced for other sample data listed in the manuscript.

    2. Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to.

      The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly. The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:<br /> * comparison to thresholding (with the same post-processing as the proposed method)<br /> * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)<br /> * comparison to references 8 and 9.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions. However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment and presentation of the results. Further, it is unclear if results of similar quality as reported can be achieved within the GUI by non-expert users.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small and well separated nuclei. It is unclear if the good performance of the novel self-supervised learning method compared to CellPose and StarDist would hold for dataset with other characteristics, such as larger nuclei with a more complex morphology or crowded nuclei. Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I am uncertain the claims hold for larger and/or more crowded nuclei as the current version of the paper implies. The contribution of the paper would be stronger if a comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a): this is not a valid experimental setup and amounts to training on your test set. If b): this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3. Note that the paper provides notebooks to reproduce the experimental results. This is very laudable, but I believe that a more extended description of the experiments in the text would still be very helpful to understand the set-up for the reader. Further, from inspection of these notebooks it becomes clear that hyper-parameters where indeed found on the testset (a), so the results are not valid in the current form.

      (3) I cannot obtain similar results to the ones reported in the manuscript using the plugin. I tried to obtain some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite narrow (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      Then I tried to obtain the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth.

      Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

    4. Author Response:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to. The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:<br /> * comparison to thresholding (with the same post-processing as the proposed method)<br /> * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)<br /> * comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment and presentation of the results. Further, it is unclear if results of similar quality as reported can be achieved within the GUI by non-expert users.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small and well separated nuclei. It is unclear if the good performance of the novel self-supervised learning method compared to CellPose and StarDist would hold for dataset with other characteristics, such as larger nuclei with a more complex morphology or crowded nuclei.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I am uncertain the claims hold for larger and/or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be stronger if a comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a): this is not a valid experimental setup and amounts to training on your test set. If b): this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3. Note that the paper provides notebooks to reproduce the experimental results. This is very laudable, but I believe that a more extended description of the experiments in the text would still be very helpful to understand the set-up for the reader. Further, from inspection of these notebooks it becomes clear that hyper-parameters where indeed found on the testset (a), so the results are not valid in the current form.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-extra.html#threshold-predictions. For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot obtain similar results to the ones reported in the manuscript using the plugin. I tried to obtain some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite narrow (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix: https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to obtain the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      - We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      -  We have made a video demo for you such that any step that might be unclear is also more clear to a user: (https://youtu.be/U2a9IbiO7nE).

      -  We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics. We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth.

      Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

    1. eLife Assessment

      This valuable study combines agent-based modelling and in vivo experiments in medaka embryos to provide new insights into the role of the thymic niche in T cell development. The modelling yields some interesting findings regarding the importance of thymic epithelial cells, for some of which the evidence is incomplete. This study would be of interest to oncologists, immunologists, and mathematical modelers.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses a cell-based computational model to simulate and study T cell development in the thymus. They initially applied this model to assess the effect of the thymic epithelial cells (TECs) network on thymocyte proliferation and demonstrated that increasing TEC size, density, or protrusions increased the number of thymocytes. They postulated and confirmed that this was due to changes in IL7 signalling and then expanded this work to encompass various environmental and cell-based parameters, including Notch signalling, cell cycle duration, and cell motility. Critical outcomes from the computational model were tested in vivo using medaka fish, such as the role of IL-7 signalling and minimal effect of Notch signalling.

      Strengths:

      The strength of the paper is the use of computational modelling to obtain unique insights into the niche parameters that control T cell development, such as the role of TEC architecture, while anchoring those findings with in vivo experiments. I can't comment on the model itself, as I am not an expert in modelling, however, the conclusions of the paper seem to be well-supported by the model.

      Weaknesses:

      One potential issue is that many of the conclusions are drawn from the number of thymocytes, or related parameters such as the thymic size or proliferation of the thymocytes. The study only touches briefly on the influence of the thymic niche on other aspects of thymocyte behaviour, such as their differentiation and death.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have worked up a ``virtual thymus' using EPISIM, which has already been published. Attractive features of the computational model are stochasticity, cell-to-cell variability, and spatial heterogeneiety. They seek to explore the role of TECs, that release IL-7 which is important in the process of thymocyte division.

      In the model, ordinary clones have IL7R levels chosen from a distribution, while `lesioned' clones have an IL7R value set to the maximum. The observation is that the lesioned clones are larger families, but the difference is not dramatic. This might be called a cell-intrinsic mechanism. One promising cell-extrinsic mechanism is mentioned: if a lesioned clone happens to be near a source of IL-7 and begins to proliferate, the progeny can crowd out cells of other clones and monopolise the IL-7 source. The effect will be more noticeable if sources are rare, so is seen when the TEC network is sparse.

      Strengths:

      Thymic disfunctions are of interest, not least because of T-ALL. New cells are added, one at a time, to simulate the conveyor belt of thymocytes on a background of stationary cells. They are thus able to follow cell lineages, which is interesting because one progenitor can give rise to many progeny.

      There are some experimental results in Figures 4,5 and 6. For example, il7 crispant embryos have fewer thymocytes and smaller thymii; but increasing IL-7 availability produces large thymii.

      Weaknesses:

      On the negative side, like most agent-based models, there are dozens of parameters and assumptions whose values and validity are hard to ascertain.

      The stated aim is to mimic a 2.5-to-11 day-old medaka thymus, but the constructed model is a geometrical subset that holds about 100 cells at a time in a steady state. The manuscript contains very many figures and lengthy descriptions of simulations run with different parameters values and assumptions. The abstract and conclusion did not help me understand what exactly has been done and learned. No attempt to synthesise observations in any mathematical formula is made.

    4. Reviewer #3 (Public review):

      Summary:

      Tsingos et al. seek to advance beyond the current paradigm that proliferation of malignant cells in T-cell acute lymphoblastic leukemia occurs in a cell-autonomous fashion. Using a computational agent-based model and experimental validation, they show instead that cell proliferation also depends on interaction with thymic epithelial cells (TEC) in the thymic niche. One key finding is that a dense TEC network inhibits the proliferation of malignant cells and favors the proliferation of normal cells, whereas a sparse TEC network leads to rapid expansion of malignant thymocytes.

      Strengths:

      A key strength of this study is that it combines computational modeling using an agent-based model with experimental work. The original modeling and novel experimental work strengthen each other well. In the agent-based model, the authors also tested the effects of varying a few key parameters of cell proliferation.

      Weaknesses:

      A minor weakness is that the authors did not conduct a global sensitivity analysis of all parameters in their agent-based model to show that the model is robust to variation, which would demonstrate that their results would still hold under a reasonable level of variation in the model and model parameters. This is a minor point, and such a supporting study would end in an appendix or supplement.

    5. Author Response:

      We thank the reviewers for their thoughtful comments on our manuscript. In this provisional response, we aim to address the major concerns raised and outline a plan for a revised version of the manuscript. A more detailed point-by-point response will follow with the revision.

      The reviewers appreciated our efforts to combine computational modelling with experimental work. However, they also expressed the need for more clarity in explaining how the model was set up, what was simulated, and what the insights and limitations are. In the revision, we plan to improve the discussion section to clarify all of these points. 

      The reviewers also highlighted the need for more transparency regarding the code and the mathematical formulas used in this study. We agree that this is an important issue. While we have already made the software and code for our computational model, along with instructions on how to run it, available in Zenodo (see Ref. 1), and have extensively described the original computational model and formulas in a 13-page supplementary file in our previous study (see Ref. 2), we recognize from the reviewers’ comments that additional transparency is needed. To address this, we will provide an appendix in the revision that includes a full model description, covering the incorporation of cell differentiation and death, a list of parameters, and details on how parameter values were chosen.

      Additionally, in the revised manuscript, we will add a paragraph to more thoroughly discuss the limitations of our approach, as well as avenues for future studies. We hope this will clarify both capabilities and limitations of our model in a way that is more  accessible to readers of eLife.

      References:

      1. Virtual Thymus Model (version 2.0). Published: Jun 14, 2024.  doi:10.5281/zenodo.11656320

      2. Aghaallaei, Narges, et al. "αβ/γδ T cell lineage outcome is regulated by intrathymic cell localization and environmental signals." Science Advances 7.29 (2021): eabg3613.

    1. eLife Assessment

      In this valuable study, the authors attempt to reconstruct the evolutionary history of a large and widespread group of freshwater fishes (Nemacheilidae) across Eurasia since the early Eocene, based on molecular phylogenetic analysis with very comprehensive samplings including 471 specimens belonging to 250 living species. The authors infer that range expansions of the family were facilitated by tectonic connections, favourable climatic conditions, and orogenic processes, adding to our understanding of the effects of climatic change on biodiversity during the Cenozoic. The molecular evidence is overall solid, but the calibration points from the fossil records used in the analysis have not been clearly demonstrated or cited; the different dates for the calibration points might impact the discussion on the evolutionary history relating to past climatic changes.

    2. Reviewer #1 (Public review):

      Summary:

      This is by far the phylogenetic analysis with the most comprehensive coverage for the Nemacheilidae family in Cobitoidea. It is a much-lauded effort. The conclusions derived using phylogenetic tools coincide with geological events, though not without difficulties (Africa pathway).

      Strengths:

      Comprehensive use of genetic tools

      Weaknesses:

      Lack of more fossil records.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present the results of molecular phylogenetic analysis with very comprehensive samplings including 471 specimens belonging to 250 species, trying to give a holistic reconstruction of the evolutionary history of freshwater fishes (Nemacheilidae) across Eurasia since the early Eocene. This is of great interest to general readers.

      Strengths:

      They provide very vast data and conduct comprehensive analyses. They suggested that Nemacheilidae contain 6 major clades, and the earliest differentiation can be dated to the early Eocene.

      Weaknesses:

      The analysis is incomplete, and the manuscript discussion is not well organized. The authors did not discuss the systematic problems that widely exist. They also did not use the conventional way to discuss the evolutionary process of branches or clades, but just chronologically described the overall history.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors intended to prove that gut GLP-1 expression and secretion can be regulated by Piezo1, and hence by mechanistic/stretching regulation. For this purpose, they have assessed Piezo1 expression in STC-1 cell line (a mouse GLP-1 producing cell line) and mouse gut, showing the correlation between Piezo1 level and Gcg levels (Figure S1). They then aimed to generate gut L cell-specific Piezo1 KO mice, and claimed the mice show impaired glucose tolerance and GLP-1 production, which can be mitigated by Ex-4 treatment (Figures 1-2). Pharmacological agents (Yoda1 and GsMTx4) and mechanic activation (intestinal bead implantation) were then utilized to prove the existence of ileal Piezo1-regulated GLP-1 synthesis (Figure 3). This was followed by testing such mechanism in a limited amount of primary L cells and mainly in the STC-1 cell line (Figures 4-7).

      While the novelty of the study is somehow appreciable, the bio-medical significance is not well demonstrated in the manuscript. The authors stated (in lines between lines 78-83) a number of potential side effects of GLP-1 analogs, how can the mechanistic study of GLP-1 production on its own be essential for the development of new drug targets for the treatment of diabetes. Furthermore, the study does not provide a clear mechanistic insight on how the claimed CaMKKbeta/CaMKIV-mTORC1 signaling pathway upregulated both GLP-1 production and secretion. This reviewer also has concerns about the experimental design and data presented in the current manuscript, including the issue of how proglucagon expression can be assessed by Western blotting.

      Strengths:

      The novelty of the concept.

      Weaknesses:

      Experimental design and key experiment information.

      We appreciate the reviewer's comments. Nowadays, GLP-1-based therapy is well-recognized and commonly used in treatment of Type 2 Diabetes Mellitus (T2DM). Therefore, elucidation of the mechanism that regulates GLP-1 production is essential for the development of new drug targets for the treatment of diabetes. We have revised the relevant wording in the manuscript.

      In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through a Ca2+/CaMKKbeta/CaMKIV pathway in our present study. Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production.

      The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. Proglucagon is encoded by the GCG gene and is cleaved by PC1/3 in L cells to form mature GLP-1. In fact, measurement of intestinal proglucagon protein is a common approach for assessing GLP-1 production in the intestine. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon at 21 kDa.

      Reviewer #2 (Public Review):

      Summary:

      The study by Huang and colleagues focuses on GLP-1 producing entero-endocrine (EEC) L-cells and their regulation of GLP-1 production by a mechano-gated ion channel Piezo1. The study describes Piezo1 expression by L-cells and uses an exciting intersectional mouse model (villin to target epithelium and Gcg to target GLP-1-producing cells and others like glucagon-producing pancreatic endocrine cells), which allows L-cell specific Piezo1 knockout. Using this model, they find an impairment of glucose tolerance, increased body weight, reduced GLP-1 content, and changes to the CaMKKbeta-CaMKIV-mTORC1 signaling pathway using a normal diet and then high-fat diet. Piezo1 chemical agonist and intestinal bead implantation reversed these changes and improved the disrupted phenotype. Using primary sorted L-cells and cell model STC-1, they found that stretch and Piezo1 activation increased GLP-1 and altered the molecular changes described above.

      Strengths:

      This is an interesting study testing a novel hypothesis that may have important mechanistic and translational implications. The authors generated an important intersectional genetics mouse model that allowed them to target Piezo1 L-cells specifically, and the surprising result of impaired metabolism is intriguing.

      Weaknesses:

      However, there are several critical limitations that require resolution before making the conclusions that the authors make.

      (1) A potential explanation for the data, and one that is consistent with existing literature [see for example, PMC5334365, PMC4593481], is that epithelial Piezo1, which is broadly expressed by the GI epithelium, impacts epithelial cell density and survival, and as such, if Piezo1 is involved in L-cell physiology, it may be through regulation of cell density. Thus, it is critical to determine L-cell densities and epithelial integrity in controls and Piezo1 knockouts systematically across the length of the gut, since the authors do not make it clear which gut region contributes to the phenotype they see. Current immunohistochemistry data are not convincing.

      We appreciate the reviewer's comment and agree that Piezo1 may impact L-cell density and epithelial integrity. To address this, we have incorporated quantification of L-cell density in new Figure Supplement 7. The quantitative results demonstrate that the specific deletion of the piezo1 gene in L cells did not significantly impact L-cell density.

      Regarding epithelial integrity, we assessed the expression of tight junction proteins (ZO-1 and Occludin). As demonstrated in new Figure Supplement 8, the expression of tight junction proteins such as ZO-1 and Occludin did not show significant changes in IntL-Piezo1-/- mice compared to littermate controls.

      Furthermore, we conducted double immunofluorescence of Piezo1 and GLP-1 in the duodenum, jejunum, ileum, and colon of control and IntL-Piezo1-/- mice. As illustrated in new Figure Supplement 5, Piezo1 is expressed in GLP-1-positive cells of the duodenum, jejunum, ileum, and colon of control mice, but not in IntL-Piezo1-/- mice.

      (2) Calcium signaling in L-cells is implicated in their typical role of being gut chemo-sensors, and Piezo1 is a calcium channel, so it is not clear whether any calcium-related signaling mechanism would phenocopy these results.

      We agree with the reviewer that Piezo1 is a calcium channel (validation of the Ca2+ influx-mediated Piezo1 in primary L cells and STC-1 cells are shown in figure 4A-C and figure 5A-C). According to our study, calcium-related signaling mechanism such as calcium/calmodulin-dependent protein kinase kinase 2 (CaMKKβ) -Calcium/Calmodulin Dependent Protein Kinase IV (CaMKIV) may contribute the phenotype seen in the _IntL-Piezo1-/_mice. In addition, we also discussed other potential calcium-related signaling mechanisms in the article's discussion section (lines645-656).

      (3) Intestinal bead implantation, while intriguing, does not have clear mechanisms and is likely to provide a point of intestinal obstruction and dysmotility.

      We appreciate the reviewer’s comment. To ascertain if intestinal bead implantation led to intestinal obstruction and dysmotility, we conducted a bowel transit time test and detected the postoperative defecation (As shown in new Figure Supplement 9). The results revealed no difference in bowel transit time and fecal mass between the sham-operated mice and those implanted with beads. Furthermore, to assess whether the animals were in pain or under any discomfort after intestinal bead implantation, we performed abdominal mechanical sensitivity test three days after the surgery. As indicated in Figure Supplement 9C, no difference in abdominal pain threshold was observed between sham and bead-implanted mice. These results suggest that the mice did not experience discomfort during the experiment.

      (4) Previous studies, some that are very important, but not cited, contradict the presented results (e.g., epithelial Piezo1 role in insulin secretion) and require reconciliation.

      Thanks a lot for the point. We have cited more previous studies. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/- mice Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impaired glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet. (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang*, Geyang Xu*, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B , Accepted, 2024. (https://doi.org/10.1016/j.apsb.2024.04.016).

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns

      (1) Figure 1L was labeled wrong, and the co-localization was not clear. The KO leads to such a strong effect on the percentage of GLP-1 positive cells (panel M) but was not clearly demonstrated with immune-staining. Additional experiments are needed to prove tissue-specific knockout in gut GLP-1-producing cells only, but not in other cell lineages or elsewhere. If so, how was the change in gut Gcg mRNA expression? Importantly, this review is not clear on how to use Western blotting to measure proglucagon expression in the tissue samples. What is the size of the product? The antibody information was not provided in the manuscript. Figure 1N, a potential mechanism that affects GLP-1 production involving mTORC and downstream molecules. This comes from nowhere.

      We appreciate the reviewer's feedback. The incorrect label has been corrected in the new Figure 1L. As suggested, we have performed additional experiments to demonstrate tissue-specific knockout of Piezo1 in gut GLP-1-producing cells exclusively, excluding other cell lineages or locations.

      As shown in Figure Supplement 6, Piezo1 remains expressed in ileal ghrelin-positive cells and pancreatic glucagon-positive cells of IntL-Piezo1-/mice, suggesting that Piezo1 was specifically knocked out in L cells, but not in other endocrine cell types. Furthermore, the decrease was only observed in GLP-1 levels, but not PYY levels, in L cells of IntL-Piezo1-/- mice compared to controls, suggesting that the loss of Piezo1 in L cells affects GLP-1 levels specifically, but not the secretion of other hormones produced by L cells (Figure Supplement 7A-D).

      In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through a Ca2+/CaMKKbeta/CaMKIV pathway in our present study.

      Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production.

      The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. Proglucagon is encoded by the GCG gene and is cleaved by PC1/3 in L cells to form mature GLP-1. In fact, measurement of intestinal proglucagon protein is a common approach for assessing GLP-1 production in the intestine. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon at 21 kDa.

      (2) In Figure 2, the LFD control mouse group was missing. Again, I don't understand the detection of proglucagon by Western blotting in this figure.

      We appreciate the reviewer's comments. The figure 1 presents the phenotypic changes of transgenic mice under low-fat diet feeding, while figure 2 focuses on the phenotypic changes of transgenic mice under high-fat diet feeding. As we mentioned before, western blot is often used in detection of the precursor of GLP-1 named proglucagon.

      (3) Why show body weight change but not body weight itself? How are the changes compared (which one serves as the control)? Again, how to do Western blotting on pro-glucagon detection?

      We appreciate the reviewer's comments. Body weight has been added in new figure3. Proglucagon is the precursor of GLP-1. Intestinal proglucagon protein measurement is commonly used to assess GLP-1 production in the intestine.

      (4) After reading the whole manuscript, this reviewer cannot get a clear picture of how the claimed CaMKKbeta-mTORC1 pathway mediates the function of Pieo1 activation (via the utilization of Yoda1 or intestinal bead implantation) on Gcg expression (at the transcription level or mRNA stability level?), hormone production, the genesis of GLP-1 producing cells, and the secretion of the hormone.

      We appreciate the reviewer's comments. Figure 7 showed that overexpression of CaMKKbeta and CaMKIV enhanced mTOR and S6K phosphorylation, proglucagon expression and GLP-1 secretoin, while CaMKKbeta inhibitor STO609 inhibited mTOR and S6K phosphorylation, proglucagon expression and GLP-1 secretoin, suggesting CaMKKbeta and CaMKIV was involved in GLP-1 production. Moreover, mTOR inhibitor rapamycin inhibited Yoda1-induced proglucagon expression and GLP-1 secretion. These results suggested that CaMKKbeta/CaMKIV/mTOR mediated the effect of Piezo1 on GLP-1 production.

      I strongly suggest that authors focus on more solid findings and dissect the mechanistic insight on something more meaningful, but not on everything (hormone coding gene expression, hormone production, and hormone secretion).

      GLP-1 production involves multiple steps, including proglucagon expression, protein cleavage, granule packaging and final release. In our present study, we focused on how mechanical signals regulated proglucagon expression in L-cells and thus promote GLP-1 production. We did not exclude the possibility that mechanical signals could also affect other step of GLP-1 production and we discussed this possibility in the discussion section.

      Minor concerns

      (1) Figure S1A. STC-1 is a Gcg expression cell line, which shows less amount of Peio1 mRNA when compared with most primary tissue samples tested. This does not support the fundamental role of Peio1 in regulating Gcg expression. Maybe qRT-PCR will be more helpful for establishing the correlation.

      Thanks a lot for the comments. As suggested, the results of qRT-PCR have been added in new Figure S1A.

      (2) There are numerous scientific presentation problems in the written manuscript. Necessary literature citations are missing especially for key methods (such as bean implantation).

      Thank you very much for your comments. We have made every effort to enhance the scientific presentation and have included the necessary literature citations.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      (1) There needs to be data localizing Piezo1 to L-cells and importantly, this needs to be quantified - are all L-cells (small bowel and colon) Piezo1 positive?

      Thank you very much for your comments. We performed double immunofluorescence of Piezo1 and GLP-1 in the duodenum, jejunum, ileum, and colon of control and IntL-Piezo1-/- mice. As shown in new Figure Supplement 5, Piezo1 is expressed in about 90% of GLP-1-positive cells in the duodenum, jejunum, ileum, and colon of control mice, but not in IntL-Piezo1-/- mice.

      (2) The intersectional model for L-cell transduction needs deeper validation. Images in Figure 1e are not convincing for the transduction of GFP in L-cells. The co-localization studies are not convincing, especially because Piezo1 labeling is very broad. There needs to be stronger validation of the intersectional Gcg-Villin-Piezo1 KO model. It is important to determine whether L-cell Piezo1 localization epithelium in the small bowel and colon is present (above) and affected specifically in the knockout.

      Thanks a lot for the comments. In our study, we conducted a double immunofluorescence analysis for Piezo1 and GLP-1 across various segments of the gastrointestinal tract, including the duodenum, jejunum, ileum, and colon, in both control and IntL-Piezo1-/- mice. As illustrated in the newly incorporated Figure Supplement 5, it was observed that Piezo1 is indeed expressed within the cells of the aforementioned gastrointestinal segments in control mice, which are also positive for GLP-1 expression. In stark contrast, no evidence of Piezo1 expression was detected in the IntL-Piezo1-/- mice. Consistent with these findings, in situ hybridization experiments corroborated the absence of Piezo1 expression within GLP-1 positive cells in the IntL-Piezo1-/- mice, offering evidence for the successful knockout of Piezo1 in the L cells of these knockout mice. (Figure 1L and M).

      In Figure 1E, IntL-Cre mice were bred with mT/mG reporter mice to further validate Cre recombinase activity and specificity. All tissues and cells of mT/mG mice express red fluorescence (membrane-targeted tdTomato; mT) at baseline, and switch to membrane-targeted EGFP in the presence of cell-specific Cre. EGFP expression was only observed scatteredly in the intestine, but not in the pancreas, indicating the intestinal-specific Cre activity in the IntL-Cre mice (Figure 1E). We have revised the relevant expressions in the main text.

      (3) The authors state that "Villin-1 (encoded by Vill1 gene) is expressed in the gastrointestinal epithelium, including L cells, but not in pancreatic α cells" (lines 378-379). However, Villin is highly expressed in whole mouse islets (https://doi.org/10.1016/j.molmet.2016.05.015, Figure 1A).

      Thanks a lot for the comments. Although Hassan Mziaut et al. reported that Villin is highly expressed in whole mouse islets, in that article, only the co-localization of insulin cells with Villin is mentioned, while the co-localization of glucagon and Villin is lacking.

      According to our research (refer to Author response image 1 below) and previous study (Rutlin, M. et al, 2020, The Villin1 Gene Promoter Drives Cre Recombinase Expression in Extraintestinal Tissues. Cell Mol Gastroenterol Hepatol, 10(4), 864-867.e865. ), Villin is sparsely expressed in pancreatic tissue but not highly expressed in islets. We did not observed co-localization of glucagon and Villin in the pancreas (see Author response image 1A and B below). The same antibody was used to stain intestine, which show specific expression on the apical side of the intestinal villi (see Author response image 1C below).

      Author response image 1.

      (4) There needs to be quantification of L-cells in Piezo1 knockout. This is because several studies show Piezo1 affecting epithelial cell densities. If there are changes in L-cell or other EEC densities in Piezo1 knockout, that shift can potentially explain the changes that the authors see in glucose metabolism and weight.

      We appreciate the reviewer’s comment. We agree that Piezo1 may affect L-cell density and epithelial integrity.

      To assess epithelial integrity we examined the expression of tight junction proteins (ZO-1 and Occludin). As shown in new Figure Supplement 8, the expression of tight junction proteins, including ZO-1 and Occludin, remained unchanged in IntL-Piezo1-/- mice when compared to littermate controls.

      To assess the L-cell density, we stained PYY, another hormone mainly secreted by L cells, in both control and IntL-Piezo1-/- mice. As shown in new Figure Supplement 7A and B, the percentage of PYY positive cells were not significantly different between control and IntL-Piezo1-/- mice, suggesting that the L-cell density was not affected by Piezo1 knockout.

      (5) L-cells are classically considered to be chemosensors. Do nutritive signals, which presumably also increase calcium compete or complement or dominate L-cell GLP1 synthesis regulation?

      We appreciate the reviewer ’ s comment and agree that L-cells are traditionally considered to be chemosensors. It is also recognized that nutritive signals regulate L-cell GLP1 synthesis. We have addressed these points in lines 568-595. Both nutritive and mechanical signals regulate GLP-1 production. While the food needs to be digested and nutrients absorbed before L-cells can detect the nutritive signals, mechanical stimulation provides a more direct and rapid response. However, determining whether nutritive signals compete, complement with mechanical signals or dominate in L-cell GLP-1 production will require to be further explored.

      (6) The mechanism of Glp1 synthesis vs release downstream of Piezo1 is not clear. The authors hypothesize that "Piezo1 might regulate GLP-1 synthesis through the CaMKKβ/CaMKIV-mTOR signaling pathway". However, references cited suggest that Ca2+ or cAMP leads to GLP-1-release, while mTOR primarily acts on the regulation of gene expression by promoting Gcg gene expression. These pathways do not clearly link to Piezo1 GLP-1 production. These mechanisms need to be reconciled.

      Thanks a lot for the point. The effect of Piezo1-mediated Ca2+ increase on GLP-1 production may be two-fold: promote Gcg gene expression through CaMKKβ/CaMKIV-mTOR and promote GLP-1 release by degranulation. Both gene expression and release are important to sustained GLP-1 production.

      (7) Previous study PMID 32640190 (not cited here) found that Villin-driven Piezo1 knockout, which knocks out Piezo1 from all epithelial intestinal cells (including L-cells), showed no significant alterations in blood glucose or body weight. This is the opposite of the presented findings and therefore the current results require reconciliation.

      We have cited PMID 32640190 in our revised manuscript. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/_mice _Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impaired glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang, Geyang Xu, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B, Accepted, 2024, https://doi.org/10.1016/j.apsb.2024.04.016).

      Reviewing Editor (Recommendations For The Authors):

      Your paper - while innovative in concept and interesting - has many flaws that in my opinion need to be corrected before the paper and pre-print should be published or uploaded as pre-print. Can you please make every effort to address the missing data that the Reviewers have asked for and correct the lack of references as noted in the reviews? Thank you.

      Thank you for the invaluable suggestions provided by the editors and reviewers. In response to these suggestions, we have included the missing data as requested and rectified the lack of references to the best of our ability. We hope that these revisions will effectively address the concerns raised by the editors and reviewers.

    2. eLife Assessment

      This study focuses on the regulation of GLP-1 in enteroendocrine L cells and how this may be stimulated by the mechanogated ion channel Piezo1 and the CaMKKbeta-CaMKIV-mTORC1 signaling pathway. The work is innovative and is considered valuable, as the hypothesis that is being tested may have significant mechanistic and translational implications. Data to support the proposed mechanism were considered incomplete, yet data to support the overall physiological characterization were considered solid.

    3. Reviewer #1 (Public review):

      Summary:

      In this manuscript, authors intended to prove that gut GLP-1 expression and secretion can be regulated by Piezo1, and hence by mechanistic/stretching regulation. For this purpose, they have assessed Piezo1 expression in STC-1 cell line (a mouse GLP-1 producing cell line) and mouse gut, showing the correlation between Piezo1 level and Gcg levels (Fig. S1). They then aimed to generate gut L cell-specific Piezo1 KO mice and claimed the mice show impaired glucose tolerance and GLP-1 production, which can be mitigated by Ex-4 treatment (Fig. 1-2). Pharmacological agents (Yoda1 and GsMTx4) and mechanic activation (intestinal bead implantation) were then utilized to prove the existence of ileal Piezo1-regulated GLP-1 synthesis (Fig. 3). This was followed by testing such mechanism in a limited amount of primary L cells and mainly in the STC-1 cell line (Fig. 4-7).

      While the novelty of the study is somehow appreciable, the bio-medical significance is not well demonstrated in the manuscript. The authors stated (in lines between lines 78-83) a number of potential side effects of GLP-1 analogs, how can the mechanistic study of GLP-1 production on its own be essential for the development of new drug targets for the treatment of diabetes. Furthermore, the study does not provide a clear mechanistic insight how the claimed CaMKKbeta/CaMKIV-mTORC1 signaling pathway upregulated both GLP-1 production and secretion. This reviewer also has concerns about the experimental design and data presented in the current manuscript, including the issue of how can proglucagon expression can be assessed by Western blotting.

      Strengths:

      Novelty of the concept.

      Weaknesses:

      Experimental design and key experiment information.

    4. Reviewer #2 (Public review):

      Summary:

      The study by Huang and colleagues focuses on GLP-1 producing enteroendocrine (EEC) L-cells and their regulation of GLP-1 production by a mechanogated ion channel Piezo1. The study describes Piezo1 expression by L-cells and using an exciting intersectional mouse model (villin to target epithelium and Gcg to target GLP-1 producing cells and others like glucagon producing pancreatic endocrine cells), which allows L-cell specific Piezo1 knockout. Using this model, they find an impairment of glucose tolerance, increased body weight, reduced GLP-1 content, and changes to the CaMKKbeta-CaMKIV-mTORC1 signaling pathway using normal diet and then high fat diet. Piezo1 chemical agonist and intestinal bead implantation reversed these changes and improved the disrupted phenotype. Using primary sorted L-cells and cell model STC-1, they found that stretch and Piezo1 activation increased GLP-1 and altered the molecular changes described above.

      Strengths:

      This is an interesting study testing a novel hypothesis that may have important mechanistic and translational implications. The authors generated an important intersectional genetics mouse model that allowed them to target Piezo1 L-cells specifically, and the surprising result of impaired metabolism is intriguing.

      Weaknesses:

      However, there are several critical limitations that require resolution before making the conclusions that the authors make. (1) A potential explanation for the data, and one that is consistent with existing literature [see for example, PMC5334365, PMC4593481], is that epithelial Piezo1, which is broadly expressed by the GI epithelium, impacts epithelial cell density and survival, and as such, if Piezo1 is involved in L-cell physiology, it may be through regulation of cell density. Thus, it is critical to determine L-cell densities and epithelial integrity in controls and Piezo1 knockouts systematically across the length of the gut, since the authors do not make it clear which gut region contributes to the phenotype they see. Current immunohistochemistry data are not convincing. (2) Calcium signaling in L-cells is implicated in their typical role of being gut chemosensors, and Piezo1 is a calcium channel, so it is not clear whether any calcium-related signaling mechanism would phenocopy these results. (3) Intestinal bead implantation, while intriguing, does not have clear mechanisms - and is likely to provide a point of intestinal obstruction and dysmotility. (4) previous studies, some that are very important, but not cited, contradict the presented results (e.g., epithelial Piezo1 role in insulin secretion) and require reconciliation.<br /> Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      - There needs to be data localizing Piezo1 to L-cells and importantly, this needs to be quantified - are all L-cells (small bowel and colon) Piezo1 positive? This is because several studies show Piezo1 affecting epithelial cell densities. If there are changes in L-cell or other EEC densities in Piezo1 knockout, that shift can potentially explain the changes that the authors see in glucose metabolism and weight.<br /> - The intersectional model for L-cell transduction needs a deeper validation. Images in Fig 1e are not convincing for transduction of GFP in L-cells. The co-localization studies are not convincing, especially because Piezo1 labeling is very broad. There needs to be stronger validation of the intersectional Gcg-Villin-Piezo1 KO model. It is important to determine whether L-cell Piezo1 localization epithelium in small bowel and colon is present (above) and affected specifically in the knockout.<br /> - The authors state that "Villin-1 (encoded by Vill1 gene) is expressed in the gastrointestinal epithelium, including L cells, but not in pancreatic α cells" (line 378-379). However, Villin is highly expressed in whole mouse islets (https://doi.org/10.1016/j.molmet.2016.05.015, Figure 1A).<br /> - There needs to be quantification of L-cells in Piezo1 knockout. This is because several studies show Piezo1 affecting epithelial cell densities. If there are changes in L-cell or other EEC densities in Piezo1 knockout, that shift can potentially explain the changes that the authors see in glucose metabolism and weight.<br /> - L-cells are classically considered to be chemosensors. Do nutritive signals, which presumably also increase calcium compete or complement or dominate L-cell GLP1 synthesis regulation?<br /> - The mechanism of Glp1 synthesis vs release downstream of Piezo1 is not clear. The authors hypothesize that "Piezo1 might regulate GLP-1 synthesis through the CaMKKβ/CaMKIV-mTOR signaling pathway". However, references cited suggest that Ca2+ or cAMP lead to GLP-1-release, while mTOR primarily acts on the regulation of gene expression by promoting Gcg gene expression. These pathways do not clearly link to Piezo1  GLP-1 production. These mechanisms need to be reconciled.<br /> - Previous study PMID 32640190 (not cited here) found that Villin-driven Piezo1 knockout, which knocks out Piezo1 from all epithelial intestinal cells (including L-cells), showed no significant alterations in blood glucose or body weight. This is opposite of the presented findings and therefore the current results require reconciliation.

      Comments on revised version:

      The authors have addressed several comments that were common to the reviewers - specificity and validity of the intersectional model, mechanism of signaling downstream of Piezo1 and reconciliation of the results with previous studies. The authors have provided extensive experiments and revisions which have made the manuscript stronger. However, many important questions remain, and unfortunately, the intersectional mouse model and mechanisms remain unclear.

      - I appreciate the authors quantifying the density of L cells in the intersectional Piezo knockout. There is a very clear >50% drop-off in GLP-1+ cells with the Piezo1 knockout (Supp fig 7c, d). Interestingly, there was not a decrease in PYY+ cells, which is curious because GLP1 and PYY are co-expressed in L cells. The mechanism of regulation of one hormone but not the other in the same cell requires clarification and would be relevant for this work. To begin with, co-labeling PYY and GLP1 and showing that one hormone can be found without the other would be useful.<br /> - Piezo1 immunofluorescence has very high background and overall poor specificity (Fig supp 5 and Fig supp 6B are good examples of poor Piezo1 immunofluorescence). Another method for labeling Piezo1 (e.g. via RNAscope) is required - and where tried (e.g., Fig 1L), the results are not convincing.<br /> - The intersectional mouse model requires further validation. The data presented in Fig 1E do not help - the GFP positive cells do not look like L-cells and there appear to be GFP positive cells in the muscle and submucosa.<br /> - Since Piezo1 is known to affect epithelial cell life span, barrier function maybe compromised. While I appreciate that the authors have obtain some images and measured zonular and occluded, this is unfortunately a suboptimal evaluation of barrier function.<br /> - The mechanisms of calcium signaling that will presumably lead to GLP1 release due to Piezo1 activation and mTOR which authors link to GLP1 synthesis remain unreconciled.<br /> - Intestinal bead implantation may provide an important area of obstruction, in addition to potential mechanical stimulation. Unfortunately whole gut transit time and fecal weight do not assay these functions well.<br /> - I believe that the explanation regarding lack of previous findings connecting Piezo1 in the epithelium and glucose tolerance remain poorly reconciled with the current findings.

    5. Reviewer #3 (Public review):

      Summary:

      In this work, the authors proposed that the mechano-gated ion channel Piezo1 enhances GLP-1 production and secretion possibly through stimulating Ca2+-CaMKKbeta-CaMKIV-mTORC1 signaling pathway. By using intestinal L cell-specific piezo1 knock-out mice, intestinal bead implantation mice model, and the chemical agonist Yoda1, the authors claimed that piezo1 promotes pro-glucagon expression, GLP-1 production and secretion. In sorted primary intestinal L cells and STC-1 cells, the authors validated that CaMKKbeta-CaMKIV-mTORC1 signaling pathway positively regulated GLP-1 production and secretion. This study provides new evidence about the specific role of piezo1 in intestinal L cells, broadening the understanding of metabolic functions of piezo1.

      Strengths:

      The new concept and innovative in vivo and in vitro models.

      Weaknesses:

      Although the authors have addressed most of the issues in the revised manuscript, there are still some questions that need to be clarified.

      (1) This study claimed that piezo1 enhances proglucagon expression, GLP-1 production and secretion through Ca2+-CaMKKbeta-CaMKIV-mTORC1 signaling pathway, which is a highly time-consuming process. However, as a mechano-gated ion channel, it should exert functions promptly. Is it possibly that piezo1 directly stimulates GLP-1 release by influx of Ca2+? if so, have authors measured intracellular Ca2+ concentration?<br /> (2) The authors proposed that the CaMKKbeta-CaMKIV-mTORC1 signaling pathway mediated the effects of piezo1. However, the data is not convincing. At least, chemical inhibitors of CaMKKbeta/CaMKIV/mTORC1 should be used in intL-piezo1 KO mice or STC-1 cells to see if piezo1-induced GLP-1 secretion was abrogated by these chemical inhibitors.<br /> (3) According to previous studies of the team, piezo1 could enhance insulin, ghrelin and GLP-1 secretion while inhibit glucagon production in pancreatic α-cells. In a recent work, the authors found that piezo1 in enterocytes suppresses nutrient absorption. Why an ion channel has these various effects in different cells? What is the fundamental and common mechanism underlying its metabolic functions? Its value as a drug target? These questions need to be discussed in more details.

    1. eLife Assessment

      This study investigates the role of Hox genes in determining the position of the forelimb bud using experimental loss- and gain-of-function approaches in chicken embryos, concluding that Hox4 and Hox5 provide permissive signals for forelimb formation throughout the neck region, while the final forelimb position is determined by the instructive signals of Hox6/7 in the lateral plate mesoderm. These results could potentially be fundamental to our understanding of Hox patterning. However, the evidence supporting these conclusions is incomplete; while the gain-of-function experiments are well supported, the loss-of-function experiments using dominant-negative constructs lack sufficient controls, and could be the result of an experimental artifact.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the role of Hox genes in determining the position of the forelimb bud through experimental loss- and gain-of-function approaches in chicken embryos. The loss-of-function experiments involved expressing dominant-negative versions of specific Hox genes in the limb bud to assess their necessity for limb formation. Gain-of-function experiments entailed expressing full-length Hox genes anterior to the limb field in the lateral mesoderm. The results were evaluated by analyzing the expression of genes involved in limb development, such as Fgf8, Fgf10, Shh, and Tbx5, the latter specifically marking the forelimb.

      The findings indicate that introducing dominant-negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7 into the forelimb field reduces bud size and downregulates certain limb markers. Conversely, introducing active versions of these genes rostral to the normal forelimb position shows that Hox4 and Hox5 have no effect, whereas Hox6 and Hox7 extend the forelimb anteriorly or create a small bulge rostral to the forelimb. The authors conclude that Hox4 and Hox5 provide permissive cues for forelimb formation throughout the neck region, with the final forelimb position determined by the instructive cues of Hox6/7 in the lateral plate mesoderm.

      Strengths:

      The authors endeavor to address the longstanding question of what determines limb position, particularly that of the forelimb, in the vertebrate embryo.

      Weaknesses:

      In my opinion, the study is preliminary and requires additional controls and explanations for conflicting results observed in mice:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of Hox genes in the specification of forelimb position. The central conclusions are that Hox paralogy group (PG) 6/7 genes are both necessary and sufficient to induce forelimb buds. In addition, the authors argue that HoxPG4/5 genes are necessary, but, by contrast to Hox PG6/7 genes, Hox PG4/5 genes are not sufficient to induce forelimb budding. To test the roles of Hox4-7 genes in limb development, the authors use both gain-of-function (GOF) and loss-of-function (LOF) approaches in chick embryos.

      In LOF experiments, they produced dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, which lack the DNA-binding domain, and they electroporated these constructs into the prospective wing field of the lateral plate mesoderm (LPM) in pre-limb bud stage (HH12) chick embryos. All 4 constructs resulted in down-regulation of Tbx5 (an early marker of forelimb development), and of its target gene, Fgf10, which is required for the initiation of limb budding, in the lateral plate mesoderm. The dominant negative experiments also caused down-regulation of Fgf8 in the overlying limb ectoderm and a marked reduction in the size of the early wing bud. Based on the LOF results, the authors conclude that each of the Hoxa4-7 genes is required for the specification of the forelimb field and for the establishment of the Fgf10-Fgf8 feedback loop in wing bud mesenchyme and overlying epithelium.

      The authors then use a GOF strategy to investigate whether the same genes are sufficient to induce forelimb budding. They test this hypothesis using the neck, a region that is known to be incompetent to form limbs in response to Fgf signaling. Overexpression of full-length Hoxa6 and Hoxa7 in the neck region caused ectopic expression of Tbx5 in the neck region, which fits with "posteriorization" of cells at neck level, as Tbx5 typically marks the forelimb and flank (interlimb) region of the lateral plate mesoderm. Consistent with a posterior transformation of positional identity (neck to forelimb), overexpression of Hoxa6 or Hoxa7 leads to activation of Fgf10 expression and development of an ectopic forelimb bud from (or extension of the normal forelimb bud into) the neck region). By contrast, overexpression of either Hoxa4 or Hoxa5 in the neck region is not sufficient to induce ectopic forelimb budding. Curiously, the ectopic forelimb buds do not express Fgf8 in the overlying ectoderm or develop beyond the bud stage. The latter finding is consistent with previous work showing that neck ectoderm is not competent to support outgrowth of transplanted limb bud mesenchyme. The authors investigate the mechanistic basis of this early arrest of outgrowth by comparing the transcriptomes of ectopic limb buds, normal forelimb buds, and normal neck cells.

      The RNA sequencing analysis shows that while some limb development genes (e.g., Lmx1b, Hoxa9, Hoxd9, Hoxa10, Hoxd10) are activated in the ectopic limb bud, other key components of the circuit (e.g., Shh, Fgf8, Hox12/13 paralogs) are not established, leading them to conclude that failure of neck ectoderm to form an AER underlies the arrested outgrowth of ectopic limb buds.

      Strengths:

      This study provides the first evidence that altering the Hox code in neck lateral plate mesoderm (LPM) is sufficient to induce ectopic development of forelimb buds at the neck level. For more than 30 years, developmental biologists have speculated and provided indirect evidence that Hox genes are involved in the specification of forelimb position, but to my knowledge, no study has shown that altering Hox gene expression alone can induce limb development outside of the normal limb field. The finding that Hox6/7 paralogs are sufficient for forelimb bud development, whereas Hox4/5 paralogs are not, suggests that specification of forelimb identity requires instructive signaling that is a specific property of Hox6/7 paralogs. The GOF experiments significantly extend the knowledge of limb specification beyond that which has come from Hox gene manipulations in mice.

      Weaknesses:

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here).

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

    4. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      Given the importance of the Loss of Function (LOF) experiments, we will provide additional evidence for the validity of the dominant-negative strategy and constructs used.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      To clarify redundancies in Hox activity, we will test whether simultaneous expression of dominant-negative forms of more than one Hox genes induces a stronger effect compared to the expression of a single dominant-negatives Hox genes.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We agree that this is an excellent additional experiment to corroborate our conclusion and will perform this experiment in our revision.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      To date, Tbx5 is the best marker for the forelimb. While it is true that the Tbx5 expression is broader than the limb field, this occurs only at early stages before forelimb bud formation. We will work towards a further definition of this extra bulge.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We have analysed the cartilage structure of operated embryos with GOF experiments and found no skeletal elements within the ectopic wing bud in the neck. Additionally, in our revision, we can further analyse the wing skeleton of operated embryos with LOF experiments, which would provide more detailed assessments of the impact of dominant-negative Hox genes on wing bud formation.

      Reviewer #2:

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We will revise our manuscript to clarify the specificity of the dominant-negative strategy used.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here).

      This is an excellent idea and we will implement the experiment in our revision.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We will incorporate this suggestion and include additional data from our RNA-seq analysis.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      In our revision, we will appropriately expand the discussion on the discrepancies observed between knockout mouse models and our chick embryo experiments.

    1. eLife Assessment

      This important study provides insights into the role of maternal behavior in the learning and ontogeny of vocalization, finding evidence that the maternal behavior of sac-winged bats (Saccopteryx bilineata) can influence the learned territorial songs of their pups. The behavioral analyses are solid, although a more comprehensive and quantitative description of the babblings and the female displays would have strengthened the study. The work will interest biologists and neuroscientists studying vocal learning and its evolution.

    2. Reviewer #1 (Public review):

      Summary:

      Fernandez et al. investigate the influence of maternal behavior on bat pup vocal development in Saccopteryx bilineata, a species known to exhibit vocal production learning. The authors performed detailed longitudinal observations of wild mother-pup interactions to ask whether non-vocal maternal displays during juvenile vocal practice or 'babbling', affect vocal production. Specifically, the study examines the durations of pup babbling events and the developmental babbling phase, in relation to the amount of female display behavior, as well as pup age and the number of nearby singing adult males. Furthermore, the authors examine pup vocal repertoire size and maturation in relation to the number of maternal displays encountered during babbling. Statistical models identify female display behavior as a predictor of i) babbling bout duration, ii) the length of the babbling phase, iii) song composition, and iv) syllable maturation. Notably, these outcomes were not influenced by the number of nearby adult males (the pups' source of song models) and were largely independent of general maturation (pup age). These findings highlight the impact of non-vocal aspects of social interactions in guiding mammalian vocal development.

      Strengths:

      Historically, work on developmental vocal learning has focused on how juvenile vocalizations are influenced by the sounds produced by nearby adults (often males). In contrast, this study takes the novel approach of examining juvenile vocal ontogeny in relation to non-vocal maternal behavior, in one of the few mammals known to exhibit vocal production learning. The authors collected an impressive dataset from multiple wild bat colonies in two Central American countries. This includes longitudinal acoustic recordings and behavioral monitoring of individual mother-pup pairs, across development.

      The identified relationships between maternal behavior and bat pup vocalizations have intriguing implications for understanding the mechanisms that enable vocal production learning in mammals, including human speech acquisition. As such, these findings are likely to be relevant to a broad audience interested in the evolution and development of social behavior as well as sensory-motor learning.

      Weaknesses:

      The authors qualitatively describe specific patterns of female displays during pup babbling, however, subsequent quantitative analyses are based on two aggregate measures of female behavior that pool across display types. Consequently, it remains unclear how certain maternal behaviors might differentially influence pup vocalizations (e.g. through specific feedback contingencies or more general modulation of pup behavioral states).

      In analyzing the effects of maternal behavior on song maturation, the authors focus on the most common syllable type produced across pups. This approach is justified based on the syllable variability within and across individuals, however, additional quantification and visual presentation of categorized syllable data would improve clarity and potentially strengthen resulting claims.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores how maternal behaviors influence vocal learning in the greater sac-winged bat (Saccopteryx bilineata). Over two field seasons, researchers tracked 19 bat pups from six wild colonies, examining vocal development aspects such as vocal practice duration, syllable repertoire size, and song syllable acquisition. The findings show that maternal behaviors significantly impact the length of daily babbling sessions and the overall babbling phase, while the presence of adult male tutors does not.

      The researchers conducted detailed acoustic analyses, categorizing syllables and evaluating the variety and presence of learned song syllables. They discovered that maternal interactions enhance both the number and diversity of learned syllables and the production of mature syllables in the pups' vocalizations. A notable correlation was found between the extent of acoustic changes in the most common learned syllable type and maternal activity, highlighting the key role of maternal feedback in shaping pups' vocal development.

      In summary, this study emphasizes the crucial role of maternal social feedback in the vocal development of S. bilineata. Maternal behaviors not only increase vocal practice but also aid in acquiring and refining a complex vocal repertoire. These insights enhance our understanding of social interactions in mammalian vocal learning and draw interesting parallels between bat and human vocal development.

      Strengths:

      This paper makes significant contributions to the field of vocal learning by looking at the role of maternal behaviors in shaping the vocal learning phenotype of Saccopteryx bilineata. The paper uses a longitudinal approach, tracking the vocal ontogeny of bat pups from birth to weaning across six colonies and two field seasons, allowing the authors to assess how maternal interactions influence various aspects of vocal practice and learning, providing strong empirical evidence for the critical role of social feedback in non-human mammalian vocal learners. This kind of evidence highlights the complexity of the vocal learning phenotype and shows that it goes beyond the right auditory experience and having the right circuitry.

      The paper offers a nuanced understanding of how specific maternal behaviors impact the acquisition and refinement of the vocal repertoire, while showing the number of male tutors - the source of adult song - did not have much of an effect. The correlation between maternal activity and acoustic changes in learned syllable types is a novel finding that underscores the importance of non-vocal social interactions in vocal learning. In vocal learning research, with some notable exceptions, experience is often understood as auditory experience. This paper highlights how, even though that is one important piece of the puzzle, other kinds of experience directly affect the development of vocal behavior. This is of particular importance in the case of a mammalian species such as Saccopteryx bilineata, as this kind of result is perhaps more often associated with avian species.

      Moreover, the study's findings have broader implications for our understanding of vocal learning across species. By drawing parallels between bat and human vocal development (and in some ways to bird vocal development), the paper highlights common mechanisms that may underlie vocal practice and learning in both humans and other mammals. This interdisciplinary perspective enriches the field and encourages further comparative studies, ultimately advancing our knowledge of the evolutionary and developmental processes that shape vocal productive learning in all its dimensions.

      Weaknesses:

      Some weaknesses can be pointed out, but in fairness, the authors acknowledge them in one way or another. As such, these are not flaws per se, but gaps that can be filled with further research.

      Experimental manipulations, such as controlled playback experiments or controlled environments, could strengthen the causal claims by directly testing the effects of specific maternal behaviors on vocal development. Certainly, the strengths of the paper will be consolidated after such work is performed.

      The reliance on the number of singing males as a proxy for social acoustic input. This measure does not account for the variability in the quality, frequency, or duration of the male songs to which the pups are exposed. A more detailed analysis of the acoustic environment, including direct measurements of song exposure and its impact on vocal learning, would provide a clearer understanding of the role of male tutors.

      Finally, and although it would be unlikely that these results are unique to Saccopteryx bilineata, the study's focus on a single species limits at present the generalizability of some of its findings to other vocal learning mammals. While the parallels drawn between bat and human vocal development are intriguing, the conclusions will be more robust when supported by comparative studies involving multiple species of vocal learners. This will help to identify whether the observed maternal influences on vocal development reported here are unique to Saccopteryx bilineata or represent a broader phenomenon in chiropteran, mammalian, or general vocal learning. Expanding the scope of research to include a wider range of species and incorporating cross-species comparisons will significantly enhance the contribution of this study to the field of vocal learning.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fernandez et al. investigate the influence of maternal behavior on bat pup vocal development in Saccopteryx bilineata, a species known to exhibit vocal production learning. The authors performed detailed longitudinal observations of wild mother-pup interactions to ask whether non-vocal maternal displays during juvenile vocal practice or 'babbling', affect vocal production. Specifically, the study examines the durations of pup babbling events and the developmental babbling phase, in relation to the amount of female display behavior, as well as pup age and the number of nearby singing adult males. Furthermore, the authors examine pup vocal repertoire size and maturation in relation to the number of maternal displays encountered during babbling. Statistical models identify female display behavior as a predictor of i) babbling bout duration, ii) the length of the babbling phase, iii) song composition, and iv) syllable maturation. Notably, these outcomes were not influenced by the number of nearby adult males (the pups' source of song models) and were largely independent of general maturation (pup age). These findings highlight the impact of non-vocal aspects of social interactions in guiding mammalian vocal development.

      We thank Reviewer 1 for the time and effort dedicated to the revision of our study. The suggestions for the revision of our manuscript were very helpful and will improve our manuscript. 

      Strengths:

      Historically, work on developmental vocal learning has focused on how juvenile vocalizations are influenced by the sounds produced by nearby adults (often males). In contrast, this study takes the novel approach of examining juvenile vocal ontogeny in relation to non-vocal maternal behavior, in one of the few mammals known to exhibit vocal production learning. The authors collected an impressive dataset from multiple wild bat colonies in two Central American countries. This includes longitudinal acoustic recordings and behavioral monitoring of individual mother-pup pairs, across development.

      The identified relationships between maternal behavior and bat pup vocalizations have intriguing implications for understanding the mechanisms that enable vocal production learning in mammals, including human speech acquisition. As such, these findings are likely to be relevant to a broad audience interested in the evolution and development of social behavior as well as sensory-motor learning.

      We thank reviewer 1 for this assessment. 

      Weaknesses:

      The authors qualitatively describe specific patterns of female displays during pup babbling, however, subsequent quantitative analyses are based on two aggregate measures of female behavior that pool across display types. Consequently, it remains unclear how certain maternal behaviors might differentially influence pup vocalizations (e.g. through specific feedback contingencies or more general modulation of pup behavioral states).

      In analyzing the effects of maternal behavior on song maturation, the authors focus on the most common syllable type produced across pups. This approach is justified based on the syllable variability within and across individuals, however, additional quantification and visual presentation of categorized syllable data would improve clarity and potentially strengthen resulting claims.

      We agree that our analysis of maternal behaviour does not investigate potential contingencies between particular maternal behavioural displays and pup vocalizations (e.g.

      particular syllable types). Our data collected for this study on maternal behaviour includes direct observations, field notes and/or video recordings. In the future, it will be necessary to work with high-speed cameras for the analysis of potential contingencies between particular maternal behavioural displays and specific pup vocalizations, which allow this kind of fine-detailed analysis. We have planned future studies investigating whether pup vocalizations elicit contingent maternal responses or vice versa. In the revision of our manuscript, we will include a comment pointing out that this special behaviour will be investigated in greater detail in the future. 

      As suggested by reviewer 1, in our revised manuscript we will include more information on methods to improve understandability. In particular, we will:

      - present more information on different steps of our acoustic analyses

      - provide additional and clearer spectrogram figures representing the different syllable types and categorizations 

      - change the figures accompanying our GLMM analyses following the suggestion of Reviewer 1

      Reviewer #2 (Public review):

      Summary:

      This study explores how maternal behaviors influence vocal learning in the greater sac-winged bat (Saccopteryx bilineata). Over two field seasons, researchers tracked 19 bat pups from six wild colonies, examining vocal development aspects such as vocal practice duration, syllable repertoire size, and song syllable acquisition. The findings show that maternal behaviors significantly impact the length of daily babbling sessions and the overall babbling phase, while the presence of adult male tutors does not.

      The researchers conducted detailed acoustic analyses, categorizing syllables and evaluating the variety and presence of learned song syllables. They discovered that maternal interactions enhance both the number and diversity of learned syllables and the production of mature syllables in the pups' vocalizations. A notable correlation was found between the extent of acoustic changes in the most common learned syllable type and maternal activity, highlighting the key role of maternal feedback in shaping pups' vocal development.

      In summary, this study emphasizes the crucial role of maternal social feedback in the vocal development of S. bilineata. Maternal behaviors not only increase vocal practice but also aid in acquiring and refining a complex vocal repertoire. These insights enhance our understanding of social interactions in mammalian vocal learning and draw interesting parallels between bat and human vocal development.

      We thank reviewer 2 for his/her time and effort dedicated to the revision of our study. The suggestions were very helpful in improving our manuscript. 

      Strengths:

      This paper makes significant contributions to the field of vocal learning by looking at the role of maternal behaviors in shaping the vocal learning phenotype of Saccopteryx bilineata. The paper uses a longitudinal approach, tracking the vocal ontogeny of bat pups from birth to weaning across six colonies and two field seasons, allowing the authors to assess how maternal interactions influence various aspects of vocal practice and learning, providing strong empirical evidence for the critical role of social feedback in non-human mammalian vocal learners. This kind of evidence highlights the complexity of the vocal learning phenotype and shows that it goes beyond the right auditory experience and having the right circuitry.

      The paper offers a nuanced understanding of how specific maternal behaviors impact the acquisition and refinement of the vocal repertoire, while showing the number of male tutors - the source of adult song - did not have much of an effect. The correlation between maternal activity and acoustic changes in learned syllable types is a novel finding that underscores the importance of non-vocal social interactions in vocal learning. In vocal learning research, with some notable exceptions, experience is often understood as auditory experience. This paper highlights how, even though that is one important piece of the puzzle, other kinds of experience directly affect the development of vocal behavior. This is of particular importance in the case of a mammalian species such as Saccopteryx bilineata, as this kind of result is perhaps more often associated with avian species.

      Moreover, the study's findings have broader implications for our understanding of vocal learning across species. By drawing parallels between bat and human vocal development (and in some ways to bird vocal development), the paper highlights common mechanisms that may underlie vocal practice and learning in both humans and other mammals. This interdisciplinary perspective enriches the field and encourages further comparative studies, ultimately advancing our knowledge of the evolutionary and developmental processes that shape vocal productive learning in all its dimensions.

      Weaknesses:

      Some weaknesses can be pointed out, but in fairness, the authors acknowledge them in one way or another. As such, these are not flaws per se, but gaps that can be filled with further research.

      Experimental manipulations, such as controlled playback experiments or controlled environments, could strengthen the causal claims by directly testing the effects of specific maternal behaviors on vocal development. Certainly, the strengths of the paper will be consolidated after such work is performed.

      The reliance on the number of singing males as a proxy for social acoustic input. This measure does not account for the variability in the quality, frequency, or duration of the male songs to which the pups are exposed. A more detailed analysis of the acoustic environment, including direct measurements of song exposure and its impact on vocal learning, would provide a clearer understanding of the role of male tutors.

      Finally, and although it would be unlikely that these results are unique to Saccopteryx bilineata, the study's focus on a single species limits at present the generalizability of some of its findings to other vocal learning mammals. While the parallels drawn between bat and human vocal development are intriguing, the conclusions will be more robust when supported by comparative studies involving multiple species of vocal learners. This will help to identify whether the observed maternal influences on vocal development reported here are unique to Saccopteryx bilineata or represent a broader phenomenon in chiropteran, mammalian, or general vocal learning. Expanding the scope of research to include a wider range of species and incorporating cross-species comparisons will significantly enhance the contribution of this study to the field of vocal learning.

      Thank you for your suggestions and comments. 

      Regarding your main comment 1: In the future, we plan to implement temporary captivity experiments to investigate how maternal behaviours affect pup vocal development. This study provides the necessary basis for conducting future playback studies investigating specific behaviours in a controlled environment.

      Regarding your main comment 2: We completely agree that the number of singing males only represents a proxy for acoustic input that pups receive during ontogeny. In the future, we plan to investigate in detail how the acoustic landscape influences pup vocal development and learning. This will include quantifying how long pups are exposed to song during ontogeny and, assessing the influence of different tutors, including a detailed analysis of song syllables of the adult tutors to compare it to vocal trajectories of song syllables in pups. 

      Regarding your main comment 3: We also fully agree that it is unlikely that these results are unique to Saccopteryx bilineata. We are certain that other mammalian vocal learners show parallels to the vocal development and learning processes of S. bilineata. Especially bats are a promising taxon for comparative studies because their vocal production and perception systems are highly sophisticated (due to their ability to echolocate). The high sociability of this taxon also includes a variety of social systems and vocal capacities (e.g. regarding vocal repertoire size, vocal learning capacities, information content, etc.) which support social learning and social feedback – as shown in our study. 

      As suggested, in our revised manuscript we will include information on the validation of the ethogram. Furthermore, we will correct all the spelling mistakes – thank you very much for pointing them out!

    1. eLife Assessment

      These are valuable findings for those interested in how neural signals reflect auditory speech streams, and in understanding the roles of prediction, attention, and eye movements in this tracking. However, the evidence as it stands is incomplete. Further details are needed on how the observed quantities relate to the relevant theoretical claims and mathematical models. Moreover, additional motivation is required for several analytical choices.

    2. Reviewer #1 (Public review):

      Summary:

      This study aimed at replicating two previous findings that showed (1) a link between prediction tendencies and neural speech tracking, and (2) that eye movements track speech. The main findings were replicated which supports the robustness of these results. The authors also investigated interactions between prediction tendencies and ocular speech tracking, but the data did not reveal clear relationships. The authors propose a framework that integrates the findings of the study and proposes how eye movements and prediction tendencies shape perception.

      Strengths:

      This is a well-written paper that addresses interesting research questions, bringing together two subfields that are usually studied in separation: auditory speech and eye movements. The authors aimed at replicating findings from two of their previous studies, which was overall successful and speaks for the robustness of the findings. The overall approach is convincing, methods and analyses appear to be thorough, and results are compelling.

      Weaknesses:

      Linking the new to the previous studies could have been done in more detail, and the extent to which results were replicated could have been discussed more thoroughly.

      Eye movement behavior could have been presented in more detail and the authors could have attempted to understand whether there is a particular component in eye movement behavior (e.g., microsaccades) that drives the observed effects.

    3. Reviewer #2 (Public review):

      Summary

      Schubert et al. recorded MEG and eye-tracking activity while participants were listening to stories in single-speaker or multi-speaker speech. In a separate task, MEG was recorded while the same participants were listening to four types of pure tones in either structured (75% predictable) or random (25%) sequences. The MEG data from this task was used to quantify individual 'prediction tendency': the amount by which the neural signal is modulated by whether or not a repeated tone was (un)predictable, given the context. In a replication of earlier work, this prediction tendency was found to correlate with 'neural speech tracking' during the main task. Neural speech tracking is quantified as the multivariate relationship between MEG activity and speech amplitude envelope. Prediction tendency did not correlate with 'ocular speech tracking' during the main task. Neural speech tracking was further modulated by local semantic violations in the speech material, and by whether or not a distracting speaker was present. The authors suggest that part of the neural speech tracking is mediated by ocular speech tracking. Story comprehension was negatively related to ocular speech tracking.

      Strengths

      This is an ambitious study, and the authors' attempt to integrate the many reported findings related to prediction and attention in one framework is laudable. The data acquisition and analyses appear to be done with great attention to methodological detail (perhaps even with too much focus on detail-see below). Furthermore, the experimental paradigm used is more naturalistic than was previously done in similar setups (i.e. stories instead of sentences).

      Weaknesses

      For many of the key variables and analysis choices (e.g. neural/ocular speech tracking, prediction tendency, mediation) it is not directly clear how these relate to the theoretical entities under study, and why they were quantified in this particular way. Relatedly, while the analysis pipeline is outlined in much detail, an overarching rationale and important intermediate results are often missing, which makes it difficult to judge the strength of the evidence presented. Furthermore, some analysis choices appear rather ad-hoc and should be made uniform and/or better motivated.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors measured neural activity (using MEG) and eye gaze while individuals listened to speech from either one or two speakers, which sometimes contained semantic incongruencies.

      The stated aim is to replicate two previous findings by this group: (1) that there is "ocular speech tracking" (that eye-movements track the audio of the speech), (2) that individual differences in neural response to tones that are predictable vs. not-predictable in their pitch is linked to neural response to speech. In addition, here they try to link the above two effects to each other, and to link "attention, prediction, and active sensing".

      Strengths:

      This is an ambitious project, that tackles an important issue and combines different sources of data (neural data, eye-movements, individual differences in another task) in order to obtain a comprehensive "model" of the involvement of eye-movements in sensory processing.

      The authors use many adequate methods and sophisticated data-analysis tools (including MEG source analysis and multivariate statistical models) in order to achieve this.

      Weaknesses:

      Although I sympathize with the goal of the paper and agree that this is an interesting and important theoretical avenue to pursue, I am unfortunately not convinced by the results and find that many of the claims are very weakly substantiated in the actual data.

      Since most of the analyses presented here are derivations of statistical models and very little actual data is presented, I found it very difficult to assess the reliability and validity of the results, as they currently stand. I would be happy to see a thoroughly revised version, where much more of the data is presented, as well as control analyses and rigorous and well-documented statistical testing (including addressing multiple comparisons).

      These are the main points of concern that I have regarding the paper, in its current format.

      (1) Prediction tendencies - assessed by listening to sequences of rhythmic tones, where the pitch was either "predictable" (i.e., followed a fixed pattern, with 25% repetition) or "unpredictable" (no particular order to the sounds). This is a very specific type of prediction, which is a general term that can operate along many different dimensions. Why was this specific design selected? Is there theoretical reason to believe that this type of prediction is also relevant to "semantic" predictions or other predictive aspects of speech processing?

      (2) On the same point - I was disappointed that the results of "prediction tendencies" were not reported in full, but only used later on to assess correlations with other metrics. Even though this is a "replication" of previous work, one would like to fully understand the results from this independent study. On that note, I would also appreciate a more detailed explanation of the method used to derive the "prediction tendency" metric (e.g, what portion of the MEG signal is used? Why use a pre-stimulus and not a post-stimulus time window? How is the response affected by the 3Hz steady-state response that it is riding on? How are signals integrated across channels? Can we get a sense of what this "tendency" looks like in the actual neural signal, rather than just a single number derived per participant (an illustration is provided in Figure 1, but it would be nice to see the actual data)? How is this measure verified statistically? What is its distribution across the sample? Ideally, we would want enough information for others to be able to replicate this finding).

      (3) Semantic violations - half the nouns ending sentences were replaced to create incongruent endings. Can you provide more detail about this - e.g., how were the words selected? How were the recordings matched (e.g., could they be detected due to audio editing?)? What are the "lexically identical controls that are mentioned"? Also, is there any behavioral data to know how this affected listeners? Having so many incongruent sentences might be annoying/change the nature of listening. Were they told in advance about these?

      (4) TRF in multi-speaker condition: was a univariate or multivariate model used? Since the single-speaker condition only contains one speech stimulus - can we know if univariate and multivariate models are directly comparable (in terms of variance explained)? Was any comparison to permutations done for this analysis to assess noise/chance levels?

      (5) TRF analysis at the word level: from my experience, 2-second segments are insufficient for deriving meaningful TRFs (see for example the recent work by Mesik & Wojtczak). Can you please give further details about how the analysis of the response to semantic violations was conducted? What was the model trained on (the full speech or just the 2-second long segments?) Is there a particular advantage to TRFs here, relative - say - to ERPs (one would expect a relatively nice N400 response, not)? In general, it would be nice to see the TRF results on their own (and not just the modulation effects).

      (6) Another related point that I did not quite understand - is the dependent measure used for the regression model "neural speech envelope tracking" the r-value derived just from the 2sec-long epochs? Or from the entire speech stimulus? The text mentions the "effect of neural speech tracking" - but it's not clear if this refers to the single-speaker vs. two-speaker conditions or to the prediction manipulation. Or is it different in the different analyses? Please spell out exactly what metric was used in each analysis.

    5. Author response:

      We appreciate all the reviewers for their encouraging comments and thoughtful feedback. We are confident that we can incorporate many of the suggestions to provide a clearer overall picture in the revised manuscript. In particular, we agree with the reviewers' concern that some of our methodological decisions, including our choice of metrics, require further clarification. We will focus on revising the methods section to make these decisions more transparent and to address any misunderstandings related to the analysis.

      We also value the request to include more data, such as intermediate results and additional control analyses. We will carefully assess which results to include in the main manuscript and which to provide in an extended supplementary section.

      To offer a more detailed understanding of our quantification of "prediction tendency," we refer to our previous work (Schubert et al., 2023, 2024), where we elaborate on our analytical choices in great detail and provide additional control analyses (e.g., ensuring that the relationship with speech tracking is not driven by participants' signal-to-noise ratio; Schubert et al., 2023).

      Additionally, we would like to clarify that the aim of this manuscript is not to analyze viewing behavior in depth but to replicate the general finding of ocular speech tracking, as presented in Gehmacher et al. (2024). A thorough investigation of specific ocular contributions (e.g., microsaccades or blinks) would require a separate research question and distinct analysis approaches, given the binary nature of such events.

      Nevertheless, we share the reviewers' interest in independent results from the current study, and we plan to carefully select and present the most relevant findings in the revised manuscript.

    1. eLife Assessment

      This study provides convincing evidence that the Kinesin protein family member KIF7 regulates the development of the cerebral cortex and its connectivity and the specificity of Sonic Hedgehog signaling by controlling the details of Gli repressor vs activator functions. This study provides important new insights into general aspects of cortical development.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting follow-up to a paper published in Human Molecular Genetics reporting novel roles in corticogenesis of the Kif7 motor protein that can regulate the activator as well as the repressor functions of the Gli transcription factors in Shh signalling. This new work investigates how a null mutation in the Kif7 gene affects the formation of corticofugal and thalamocortical axon tracts and the migration of cortical interneurons. It demonstrates that the Kif7 null mutant embryos present with ventriculomegaly and heterotopias as observed in patients carrying KIF7 mutations. The Kif7 mutation also disrupts the connectivity between the cortex and thalamus and leads to an abnormal projection of thalamocortical axons. Moreover, cortical interneurons show migratory defects that are mirrored in cortical slices treated with the Shh inhibitor cyclopamine suggesting that the Kif7 mutation results in a down-regulation of Shh signalling. Interestingly, these defects are much less severe at later stages of corticogenesis.

      Strengths/weaknesses:

      The findings of this manuscript are clearly presented and are based on detailed analyses. Using a compelling set of experiments, especially the live imaging to monitor interneuron migration, the authors convincingly investigate Kif7's roles and their results support their major claims. The migratory defects in interneurons and the potential role of Shh signalling present novel findings and provide some mechanistic insights but rescue experiments would further support Kif7's role in interneuron migration. Similarly, the mechanism underlying the misprojection which has previously been reported in other cilia mutants remains unexplored. Taken together, this manuscript makes novel contributions to our understanding of the role of primary cilia in forebrain development and to the aetiology of neural symptoms in ciliopathy patients.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of KIF7, a ciliary kinesin involved in the Sonic Hedgehog (SHH) signaling pathway, in cortical development using Kif7 knockout mice. The researchers examined embryonic cortex development (mainly at E14.5), focusing on structural changes and neuronal migration abnormalities.

      Strengths:

      (1) The phenotype observed is interesting, and the findings provide neurodevelopmental insight into some of the symptoms and malformations seen in patients with KIF7 mutations.

      (2) The authors assess several features of cortical development, including structural changes in layers of the developing cortex, connectivity of the cortex with the thalamus, as well as migration of cINs from CGE and MGE to the cortex.

      Weaknesses:

      (1) The Kif7 null does have phenotype differences from individual mutations seen in patients. It would be interesting to add more thoughts about how the null differs from these mutants in ciliary structure and SHH signaling via the cilium.

      (2) The description of altered cortex development at E14.5 is perhaps rather descriptive. It would be useful to assess more closely the changes occurring in different cell types and stages. For this it seems very important to have a time course of cortical development and how the structural organization changes over time. This would be easy to assess with the addition of serial sections from the same mice. It might also be interesting to see how SHH signaling is altered in different cortical cell types over time with a SHH signaling reporter mouse.

      (3) Abnormal neurodevelopmental phenotypes have been widely reported in the absence of other key genes affecting primary cilia function (Willaredt et al., J Neurosci 2008; Guo et al., Nat Commun 2015). It would be interesting to have more discussion of how the Kif7 null phenotype compares to some of these other mutants.

      (4) The authors see alterations in cIN migration to the cortex and observe distinct differences in the pattern of expression of Cxcl12 as well as suggest cell-intrinsic differences within cIN in their ability to migrate. The slice culture experiments though make it a little difficult to interpret the cell intrinsic effects on cIN of loss of Kif7, as the differences in Cxcl12 patterns still exist presumably in the slice cultures. It would be useful to assess their motility in an assay where they were isolated, as well as assess transcriptional changes in cINs in vivo lacking KIF7 for expression patterns that may affect motility or other aspects of migration.

    1. eLife Assessment

      This is an important study demonstrating that cholecystokinin is a key modulator of auditory thalamocortical plasticity during development and in young adults but not aged mice, though the cortical application of this neuropeptide in older animals appears to go some way to restoring this age-dependent loss in plasticity. A strength of this work is the use of multiple experimental approaches, which together provide convincing support for the proposed involvement of cholecystokinin. Nevertheless, the specificity of the electrical and optical stimulation experiments requires further validation and some key details are missing in the presentation and discussion of these findings. This work is likely to be influential in opening up a new avenue of investigation into the roles of neuropeptides in sensory plasticity.

    2. Reviewer #1 (Public review):

      This study offers a valuable investigation into the role of cholecystokinin (CCK) in thalamocortical plasticity during early development and adulthood, employing a range of experimental techniques. The authors demonstrate that tetanic stimulation of the auditory thalamus induces cortical long-term potentiation (LTP), which can be evoked through either electrical or optical stimulation of the thalamus or by noise bursts. They further show that thalamocortical LTP is abolished when thalamic CCK is knocked down or when cortical CCK receptors are blocked. Interestingly, in 18-month-old mice, thalamocortical LTP was largely absent but could be restored through the cortical application of CCK. The authors conclude that CCK contributes to thalamocortical plasticity and may enhance thalamocortical plasticity in aged subjects.

      While the study presents compelling evidence, I would like to offer several suggestions for the authors' consideration:

      (1) Thalamocortical LTP and NMDA-Dependence:<br /> It is well established that thalamocortical LTP is NMDA receptor-dependent, and blocking cortical NMDA receptors can abolish LTP. This raises the question of why thalamocortical LTP is eliminated when thalamic CCK is knocked down or when cortical CCK receptors are blocked. If I correctly understand the authors' hypothesis - that CCK promotes LTP through CCKR-intracellular Ca2+-AMPAR. This pathway should not directly interfere with the NMDA-dependent mechanism. A clearer explanation of this interaction would be beneficial.

      (2) Complexity of the Thalamocortical System:<br /> The thalamocortical system is intricate, with different cortical and thalamic subdivisions serving distinct functions. In this study, it is not fully clear which subdivisions were targeted for stimulation and recording, which could significantly influence the interpretation of the findings. Clarifying this aspect would enhance the study's robustness.

      (3) Statistical Variability:<br /> Biological data, including field excitatory postsynaptic potentials (fEPSPs) and LTP, often exhibit significant variability between samples, sometimes resulting in a standard deviation that exceeds 50% of the mean value. The reported standard deviation of LTP in this study, however, appears unusually small, particularly given the relatively limited sample size. Further discussion of this observation might be warranted.

      (4) EYFP Expression and Virus Targeting:<br /> The authors indicate that AAV9-EFIa-ChETA-EYFP was injected into the medial geniculate body (MGB) and subsequently expressed in both the MGB and cortex. If I understand correctly, the authors assume that cortical expression represents thalamocortical terminals rather than cortical neurons. However, co-expression of CCK receptors does not necessarily imply that the virus selectively infected thalamocortical terminals. The physiological data regarding cortical activation of thalamocortical terminals could be questioned if the cortical expression represents cortical neurons or both cortical neurons and thalamocortical terminals.

      (5) Consideration of Previous Literature:<br /> A number of studies have thoroughly characterized auditory thalamocortical LTP during early development and adulthood. It may be beneficial for the authors to integrate insights from this body of work, as reliance on data from the somatosensory thalamocortical system might not fully capture the nuances of the auditory pathway. A more comprehensive discussion of the relevant literature could enhance the study's context and impact.

      (6) Therapeutic Implications:<br /> While the authors suggest potential therapeutic applications of their findings, it may be somewhat premature to draw such conclusions based on the current evidence. Although speculative discussion is not harmful, it may not significantly add to the study's conclusions at this stage.

    3. Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because it opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      The behavioral assessment is relatively limited but may be fleshed out in future work.

    4. Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity are almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results along with the rigor multi-angled approach provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation, and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single-neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      (4) The authors mention that CCK mRNA was absent in CCK-KO mice, but the data are not provided.

      (5) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

    1. eLife Assessment

      This important study highlights the key role of the gut-liver axis mediated by LPS in causing hepatic steatosis. The authors provide solid evidence, in vivo, in vitro, and in silico, for the role of acyloxyacyl hydrolase in mediating this effect using KO mice subjected to MASD-inducing diets. The findings are significant for the liver research community and others interested in the gut-liver axis.

    2. Reviewer #1 (Public review):

      Lu et. al. proposed here a direct role of LPS in inducing hepatic fat accumulation and that the metabolism of LPS therefore can mitigate fatty liver injury. With an Acyloxyacyl hydrolase whole-body KO mice, they demonstrated that Acyloxyacyl hydrolase deletion resulted in higher hepatic fat accumulation over 8 months of high glucose/high fructose diet. Previous literature has found that hepatocyte TLR4 (which is a main receptor for binding LPS) KO reduced fatty liver in the MAFLD model, and this paper complements this by showing that degradation/metabolism of LPS can also reduce fatty liver. This result proposed a very interesting mechanism and the translational implications of utilizing Acyloxyacyl hydrolase to decrease LPS exposure are intriguing.

      The strengths of the present study include that they raised a very simplistic mechanism with LPS that is of interest in many diseases. The phenotype shown in the study is strong. The mechanism proposed by the findings is generally well supported.

      There are also several shortcomings in the findings of this study. As AOAH is a whole-body KO, the source production of AOAH in MAFLD is unclear. Although the authors used published single-cell RNA-seq data and flow-isolated liver cells, physiologically LPS degradation could occur in the blood or the liver. The authors linked LPS to hepatocyte fatty acid oxidation via SREBP1. The mechanism is not explored in great depth. Is this signaling TLR4? In this model, LPS could activate macrophages and mediate the worsening of hepatocyte fatty liver injury via the paracrine effect instead of directly signaling to hepatocytes, thus it is not clear that this is a strictly hepatocyte LPS effect. It would also be very interesting to see if the administration of the AOAH enzyme orally could mitigate MAFLD injury. Overall, this work adds to the current understanding of the gut-liver axis and development of MAFLD and will be of interest to many readers.

    3. Reviewer #2 (Public review):

      The authors of this article investigated the impact of the host enzyme AOAH on the progression of MASLD in mice. To achieve this, they utilized whole-body Aoah-/- mice. The authors demonstrated that AOAH reduced LPS-induced lipid accumulation in the liver, probably by decreasing the expression and activation of SREBP1. In addition, AOAH reduced hepatic inflammation and minimized tissue damage.

      However, this paper is descriptive without a clear mechanistic study. Another major limitation is the use of who-body KO mice so the cellular source of the enzyme remains undefined. Moreover, since LPS-mediated SREBP1 regulation or LPS-mediated MASLD progression is already documented, the role of AOAH in SREBP1-dependent lipid accumulation and MASLD progression is largely expected.

      Specific comments:

      (1) The overall human relevance of the current study remains unclear.

      (2) Is AOAH secreted from macrophages or other immune cells? Are there any other functions of AOAH within the cells?

      (3) Due to using whole-body KO mice, the role of AOAH in specific cell types was unclear in this study, which is one of the major limitations of this study. The authors should at least conduct in vitro experiments using a co-culture system of hepatocytes and Kupffer cells (or other immune cells) isolated from WT or Aoah-/- mice.

      (4) It has been well-known that intestinal tight junction permeability is increased by LPS or inflammatory cytokines. However, in Figure 3E, intestinal permeability is comparable between the groups in both diet groups. The authors should discuss more about this result. In addition, intestinal junctional protein should be determined by Western blot and IHC (or IF) to further confirm this finding.

      (5) In Figure 6, LPS i.g. Aoah-/- group is missing. This group should be included to better interpret the results.

      (6) The term NAFLD has been suggested to be changed to MASLD as the novel nomenclature according to the guidelines of AASLD and EASL.

    1. eLife Assessment

      This important study provides solid in-vivo evidence that CCR4 regulates the early inflammatory response during atherosclerotic plaque formation. The authors propose that altered T-cell response plays a role in this process, shedding light on mechanisms that may be of interest to medical biologists, biochemists, cell biologists, and immunologists. Further in vivo validation, mechanistic studies, and discussion of results in vitro suggested would be helpful to cement the significance and implications of these findings.

    2. Reviewer #1 (Public review):

      Summary:

      The article provides valuable information on the role of CCR4 in an inflammatory condition, namely, the arteriosclerosis plaque. The data demonstrated that in the absence of CCR4, the Th1 cells infiltrated the plaque and Tregs lost its functions. The data are clear and well-presented. Mostly importantly, the data on CCR4-specific deficiency in Regulatory T cells is more impressive.

      Strengths:

      The data are clear, well performed, and interesting in focusing on the plaque and compared to peripheral organs. The disease is relevant and the data could be used to understand the risk of patients under immunomodulator use.

      Weaknesses:

      Still, we don't know the mechanism, besides migration.

    3. Reviewer #2 (Public review):

      Summary:

      Tanaka et al. investigated the role of CCR4 in early atherosclerosis, focusing on the immune modulation elicited by this chemokine receptor under hypercholesterolemia. The study found that Ccr4 deficiency led to qualitative changes in atherosclerotic plaques, characterized by an increased inflammatory phenotype. The authors further analyzed the CD4 T cell immune response in para-aortic lymph nodes and atherosclerotic aorta, showing an increase mainly in Th1 cells and the Th1/Treg ratio in Ccr4-/-Apoe-/- mice compared to Apoe-/- mice. They then focused on Tregs, demonstrating that Ccr4 deficiency impaired their immunosuppressive function in in-vitro assays and elegantly showed that Ccr4-deficient Tregs had, as expected, impaired migration to the atherosclerotic aorta. Adoptive cell transfer of Ccr4-/- Tregs to Apoe-/- mice mimicked early atherosclerosis development in Ccr4-/-Apoe-/- mice. Therefore, this work shows that CCR4 plays an important role in early atherosclerosis but not in advanced stages.

      Strengths:

      Several in vivo and in vitro approaches were used to address the role of CCR4 in early atherosclerosis. Particularly, through the adoptive cell transfer of CCR4+ or CCR4- Tregs, the authors aimed to directly demonstrate the role of CCR4 in Tregs' protection against early atherosclerosis.

      Weaknesses:

      The isolation of Tregs was inadequately controlled; they were isolated based solely on CD4 and CD25 expression. CD25 is also expressed by activated effector T cells, meaning the analyzed cells could be a pool of mainly Tregs but also include effector T cells.

      The study primarily focused on Th1 and Tregs without thoroughly investigating other CD4 T cell subsets. Th17 cells are known to play an important role in atherosclerosis; non-pathogenic Th17 cells express CCR4, while pathogenic Th17 cells do not. Considering that Figure 3 shows an increased frequency of IL17-expressing CD4 T cells compared to Apoe-/- mice, and given the imprecise Treg isolation, differences in non-pathogenic Th17 cells could be contributing to the observed effects.

      Furthermore, the clinical relevance of these findings is not discussed. As an initial approach, the authors could analyze public datasets to determine if certain Ccr4 single nucleotide polymorphisms correlate with a higher incidence of atherosclerosis.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Tanaka and colleagues address the role played by the C-C chemokine receptor 4 (CCR4) in developing early atherosclerotic plaques using ApoE-deficient mice fed with a standard chow diet as a model. Since CCR4 is expressed in several T CD4+ lymphocyte subsets, the authors examined the consequences of CCR4 deficiency on the differentiation profile and traffic of T CD4+ lymphocytes. By histological analysis of aortic lesions, they demonstrated that the absence of CCR4 promoted the development of early atherosclerosis, characterized by an inflammatory reaction with increased levels of macrophages and T CD4+ inflammatory lymphocytes while decreased collagen content. Using flow cytometry together with mRNA expression analysis for identifying T CD4+ cell subsets, the authors found that the accelerated aortic inflammation induced by CCR4 deficiency correlated with higher proliferation of T CD4+ cells in lymphoid tissues, favouring the expansion of the pro-inflammatory effector Th1 cell subset, typically found in atherosclerotic lesions. Interestingly, the increased T CD4+ cell response occurred despite the expansion of T CD4+ Foxp3+ regulatory cells (Treg), which were in higher numbers in the lymphoid tissues of CCR4-deficient mice, suggesting the absence of CCR4 interfered with the regulatory actions of Treg cells. Using in vitro and or in vivo approaches, the authors found evidence of CCR4 requirement for Treg suppressive activity and migratory capacity to inflamed aortic areas, contributing to why CCR4 deficiency induced an augmented Th1/Treg ratio in the aortic lesions. These findings might not be surprising considering the demonstrated involvement of CCR4 in driving Treg migration to inflamed tissues in immune-related pathological models and Treg-dendritic cell contact for imprinting suppressive signals. However, in previous studies using a murine model of advanced atherosclerosis, neither hematopoietic nor systemic CCR4 deficiency altered the development of the aortic lesions. The authors included a thoughtful discussion about hypothetical mechanisms explaining these contrasting results, highlighting putative differences in the role played by the CCL17/CCL22-CCR4 axis along the stages of atherosclerosis development in this murine model.

      Major strengths and weaknesses:

      The main effects of CCR4 deficiency on early atherosclerosis development and Treg functional loss are valuable and supported by collected data. In vivo studies for comparing Treg-tissue accumulation or atherosclerotic lesions in Apoe-/- mice that received Treg derived from Apoe-/- or Apoe-/-Ccr4-/- mice, strengthening results. However, an incomplete description of methods (particularly flow cytometry) and data analysis weakens some conclusions of this study. Readers should note some inconsistencies in the T CD4+ response analysis in different tissues. In aortic lesions, but not in lymphoid tissues (peripheral, para-aortic, and spleen), the ratio Th1/Treg was used for evaluating the effect of CCR4 deficiency on the profile of Th cell subsets. In lymphoid tissues, increments in the frequency of both effector Th1 and Treg were observed in CCR4-deficient Apoe-/- mice compared to CCR4-sufficient Apoe-/- mice. Therefore, it is not convincing that CCR4-deficiency shifts Th1 cell/Treg balance toward Th1 cell responses in all lymphoid tissues; this claim needs to be revised by the authors. The Treg dysfunction, caused by CCR4 deficiency, enhanced T CD4+ activation and might have amplified rather than shifted, the typical biased Th1-mediated inflammatory response observed in the lymphoid tissues of hypercholesterolemic mice. A different scenario emerged in aortic lesions, where recruitment of effector Th1 cells, but not of additional effector T CD4+ cell subsets expanded in lymphoid tissues, leading to a higher Th1/Treg balance. Also, effector Th17 cells seem to predominate among effector TCD45+CD3+CD4+ cells in the aorta of Apoe-/- mice, and the Th1/Th17 balance appears to have increased as a consequence of CCR4 deficiency as well. Modulation of Th1/Th17 balance might be responsible for changes in the type and functional properties of recruited inflammatory cells in the aorta.

      Study limitations:

      This investigation has some limitations. Current tools for single-cell characterization have revealed the phenotypic heterogeneity and dynamics of aortic leukocytes, including T cells, which are among the principal aortic leukocytes found in mouse and human atherosclerotic lesions (doi:10.1161/CIRCRESAHA.117.312513). The flow cytometry analysis applied in this study cannot distinguish the generation of particular phenotypes within T CD4+ subsets, including putative phenotypes of no-suppressive T cells expressing low levels of Foxp3, as seems could occur in other chronic inflammatory disorders (doi: 10.1038/nm.3432; doi: 10.1172/JCI79014). Limitations due to the use of a complete CCR4 knockout mouse and putative differences in CCR4-mediated mechanisms along atherosclerosis stages and in human atherosclerosis were commented on by the authors in the discussion.

      Global Impact

      This work opens the way for a deeper analysis of the contribution of CCR4 and its ligands to the activation and differentiation of T CD4+ lymphocytes during atherosclerosis development, with these lymphocytes being fundamental players in the generation of pro-atherogenic and anti-atherogenic immune responses. Differences in the mechanisms mediated by the CCL17/CCL22-CCR4 axis among early and advanced atherosclerosis highlight the complex landscape to examine and validate in human samples and the need to achieve a deep knowledge for identifying genuine and safe targets capable of promoting protective anti-atherogenic immune responses.

    1. eLife Assessment

      Wittkamp et al. investigated the spatiotemporal dynamics of expectation of pain using an original fMRI-EEG approach. The methods are solid and the evidence for a substantially different neural representation between the anticipatory and the actual pain period is convincing. These important findings are discussed within a general framework that encompasses their research questions, hypotheses, and analysis of results. Although the choice of conditions and their influence on the results might accept different interpretations, the manuscript is strong and contributes beneficial insights to the field.

    2. Reviewer #1 (Public review):

      Summary:

      In this important paper the authors investigate the temporal dynamics of expectation of pain using a combined fMRI-EEG approach. More specifically, by modifying the expectations of higher or lower pain on a trial-to- trial basis they report that expectations largely share the same set of activations before the administration of the painful stimulus and that the coding of the valence of the stimulus is observed only after the nociceptive input has been presented. fMRI informed EEG analysis suggested that the temporal sequence of information processing involved the Dorsolateral prefrontal cortex (DLPFC), the anterior insula and the anterior cingulate cortex. The strength of evidence is convincing, the methods are solid, but a few alternative interpretations about the findings related to the control group, as well as a more in depth discussion on the correlations between the BOLD and EEG signals would strengthen the manuscript.

      Strengths:

      In line with open science principles, the article presents the data and the results in a complete and transparent fashion.<br /> On the theoretical standpoint, the authors make a step forward in our understanding of how expectations modulate pain by introducing a combination of spatial and temporal investigation. It is becoming increasingly clear that our appraisal of the world is dynamic, guided by previous experiences and mapped on a combination of what we expect and what we get. New research methods, questions and analyses are needed to capture this evolving process.

      Weaknesses:

      The authors have addressed my concerns about the control condition and made some adjustments, namely acknowledging that participants cannot be "expectations" free and investigating whether scores in the control condition are simply due to a "regression to the mean".

      General considerations and reflections

      Inducing expectations in the desired direction is not a straightforward task, and results might depend on the exact experimental conditions and the comparison group. In this sense, the authors choice of having 3 groups of positive, negative and "neutral" expectations is to be praised. On the other hand, also control groups form their expectations, and this can constitute a confounder in every experiment using expectation manipulation, if not appropriately investigated. The authors have addressed this element in their revised submission.

      In addition, although fMRI is still (probably) the best available tool we have to understand the spatial representation of cortical processing, limitations about not only the temporal but even the spatial resolution should be acknowledged. This has been done. Given the anatomical and physiological complexity of the cortical connections, as we know from the animal world, it is still well possible that sub circuits are activated also for positive and negative expectations, but cannot be observed due to the limitation of our techniques. Indeed, on an empirical/evolutionary bases, it would remain unclear why we should have a system that waits for the valence of a stimulus to show differential responses.<br /> Also, moving in a dimension of network and graph theory, one would not expect single areas to be responsible for distinct processes, but rather that they would more integrate information in a shared way, potentially with different feedback and feedforward communications. As such, it becomes more difficult to assume the insula as a center for coding potential pain, perhaps more of a node in a system that signals potential dangers for the integrity of the body.<br /> The rationale for the choice of their EEG band has been outlined.

    3. Reviewer #2 (Public review):

      I appreciate the authors' thorough revision of the manuscript, which has significantly improved its quality. I have no additional comments or requests for further changes.

      However, I remain in slight disagreement regarding the characterization of the neutral condition. My perspective is that it resembles more of a "medium" condition, making it challenging to understand what would be common to "high-medium" and "low-medium" contrasts. I suspect that the neutral condition might represent a state of high uncertainty since participants are informed that the algorithm cannot provide a prediction. From this viewpoint, the observed similarities in effects for both positive and negative expectations may actually reflect differences between certainty and uncertainty rather than the specific expectations themselves.

      Nevertheless, the authors have addressed alternative interpretations of their discussion section, and I have no further requests. The paper is well-executed and demonstrates several strengths: the procedure effectively induced varying levels of expectations with clear impacts on pain ratings. Additionally, the integration of fMRI with EEG is commendable for tracking the transition from anticipatory to pain periods. Overall, the manuscript is strong and contributes valuable insights to the field.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful and overall positive evaluation of our work and the constructive feedback! To address the main concerns, we have:

      – Clarified a major misunderstanding of our instructions: Participants were only informed that they would receive different stimuli of medium intensity and were thus not aware that the stimulation temperature remained constant

      – Implemented a new analysis to evaluate how participants rated their expectation and pain levels in the control condition

      – Added a paragraph in the discussion in which we argue that our paradigm is comparable to previous studies

      Below, we provide responses to each of the reviewers’ comments on our manuscript.

      Reviewer #1 (Public Review):

      Summary:  

      In this important paper, the authors investigate the temporal dynamics of expectation of pain using a combined fMRI-EEG approach. More specifically, by modifying the expectations of higher or lower pain on a trial-to-trial basis, they report that expectations largely share the same set of activations before the administration of the painful stimulus, and that the coding of the valence of the stimulus is observed only after the nociceptive input has been presented. fMRIinformed EEG analysis suggested that the temporal sequence of information processing involved the Dorsolateral prefrontal cortex (DLPFC), the anterior insula, and the anterior cingulate cortex. The strength of evidence is convincing, and the methods are solid, but a few alternative interpretations about the findings related to the control group, as well as a more in-depth discussion on the correlations between the BOLD and EEG signals would strengthen the manuscript. 

      Thank you for your positive evaluation! In the revised version of the manuscript, we elaborated on the control condition and the BOLD-EEG correlations in more detail.

      Strengths:  

      In line with open science principles, the article presents the data and the results in a complete and transparent fashion. 

      From a theoretical standpoint, the authors make a step forward in our understanding of how expectations modulate pain by introducing a combination of spatial and temporal investigation. It is becoming increasingly clear that our appraisal of the world is dynamic, guided by previous experiences, and mapped on a combination of what we expect and what we get. New research methods, questions, and analyses are needed to capture these evolving processes.  

      Thank you very much for these positive comments!

      Weaknesses:  

      The control condition is not so straightforward. Across the manuscript it is defined as "no expectation", and in the legend of Figure 1 it is mentioned that the third state would be "no prediction". However, it is difficult to conceive that participants would not have any expectations or predictions. Indeed, in the description of the task it is mentioned that participants were instructed that they would receive stimuli during "intermediate sensitive states". The results of the pain scores and expectations might support the idea that the control condition is situated in between the placebo and nocebo conditions. However, since this control condition was not part of the initial conditioning, and participants had no reference to previous stimuli, one might expect that some ratings might have simply "regressed to the mean" for a lack of previous experience. 

      General considerations and reflections:  

      Inducing expectations in the desired direction is not a straightforward task, and results might depend on the exact experimental conditions and the comparison group. In this sense, the authors' choice of having 3 groups of positive, negative, and "neutral" expectations is to be praised. On the other hand, also control groups form their expectations, and this can constitute a confounder in every experiment using expectation manipulation, if not appropriately investigated. 

      Thank you for raising these important concerns! Firstly, as it seems that we did not explain the experimental procedure in a clear fashion, there appeared to be a general misunderstanding regarding our instructions. We want to emphasize that we did not tell participants that the stimulus intensity would always be the same, but that pain stimuli would be different temperatures of medium intensity. Furthermore, our instruction did not necessarily imply that our algorithm detected a state of medium sensitivity, but that the algorithm would not make any prediction, e.g., due to highly fluctuating states of pain sensitivity, or no clear-cut state of high or low pain sensitivity. We changed this in the Methods (ll. 556-560, 601-606, 612-614) and Results (ll. 181-192) sections of the manuscript to clarify these important features of our procedure.

      Then, we absolutely agree that participants explicitly and implicitly form expectations regarding all conditions over time, including the control condition. We carefully considered your feedback and rephrased the control condition, no longer framing it as eliciting “no expectations” but as “neutral expectations” in the revised version of the manuscript. This follows the more common phrasing in the literature and acknowledges that participants indeed build up expectations in the control condition. However, we do still think that we can meaningfully compare the placebo and nocebo condition to the control condition to investigate the neuronal underpinnings of expectation effects. Independently of whether participants build up an expectation of “medium” intensities in the control condition, which caused them to perceive stimuli in line with this expectation, or if they simply perceived the stimuli as they were (of medium intensity) with limited effects of expectations, the crucial difference to the placebo and nocebo conditions is that there was no alteration of perception due to previous experiences or verbal information and no shift of perception from the actual stimulus intensity towards any direction in the control condition. This allowed us to compare the neural basis of a modulation of pain perception in either direction to a condition in which this modulation did not take place. 

      Author response image 1.

      Variability within conditions over time. Relative variability index for expectation (left) and pain ratings (right) per condition and measurement block. 

      Lastly, we want to highlight that our finding of the control condition being rated in between the placebo and nocebo condition is in line with many previous studies that included similar control conditions and advanced our understanding of pain-related expectations (Bingel et al., 2011; Colloca et al., 2010; Shih et al., 2019). We thank the reviewer for the very interesting idea to evaluate the development of ratings in the control condition in more detail and added a new analysis to the manuscript in which we compared how much intra-subject variance was within the ratings of each of the three conditions and how much this variance changed over time. For this aim, we computed the relative variability index (Mestdagh et al., 2018), a measure that quantifies intra-subject variation over multiple ratings, and compared between the three conditions and the three measurement blocks. We observed differences in variances between conditions for both expectation (F(2,96) = 8.14, p < .001) and pain ratings (F(2,96) = 3.41, p = .037). For both measures, post-hoc tests revealed that there was significantly more variance in the placebo compared to the control condition (both p_holm < .05), but no difference between control and nocebo. The substantial and comparable variation in pain and expectation ratings in all three conditions (or at least between control and nocebo) shows that participants did not always expect and perceive the same intensity within conditions. Variance in expectation ratings decreased from the first block compared to the other two blocks (_F(1.35,64.64) = 5.69, p = .012; both p_holm < .05), which was not the case for pain ratings. Most importantly, there was no interaction effect of block and condition for neither expectation (_F(2.65,127.06) = 0.40, p = .728) nor pain ratings (F(4,192) = 0.48, p = .748), which implies that expectations were similarly dynamically updated in all conditions over the course of the experiment. This speak against a “regression to the mean” in the control condition and shows that control ratings fluctuated from trial to trial. We included this analysis and a more in-depth discussion of the choice of conditions in the Result (ll. 219-232) and Discussion (ll. 452-486) sections of the revised manuscript.

      In addition, although fMRI is still (probably) the best available tool we have to understand the spatial representation of cortical processing, limitations about not only the temporal but even the spatial resolution should be acknowledged. Given the anatomical and physiological complexity of the cortical connections, as we know from the animal world, it is still well possible that subcircuits are activated also for positive and negative expectations, but cannot be observed due to the limitation of our techniques. Indeed, on an empirical/evolutionary basis it would remain unclear why we should have a system that waits for the valence of a stimulus to show differential responses. 

      We agree that the spatial resolution of fMRI is limited and that our signal is often not able to dissociate different subcircuits. Whether on this basis differential processes occurred cannot be observed in fMRI but is indeed possible. We now include this reasoning in our Discussion (ll. 373-377):

      “Importantly, the spatial resolution of fMRI is limited when it comes to discriminating whether the same pattern of activity is due to identical activation or to activation in different sub-circuits within the same area. Nonetheless, the overlap of areas is an indicator for similar processes involved in a more general preparation process.

      Also, moving in a dimension of network and graph theory, one would not expect single areas to be responsible for distinct processes, but rather that they would integrate information in a shared way, potentially with different feedback and feedforward communications. As such, it becomes more difficult to assume the insula is a center for coding potential pain, perhaps more of a node in a system that signals potential dangers for the integrity of the body. 

      We appreciate the feedback on our interpretation of our results and agree that the overall network activity most likely determines how a large part of expectations and pain are coded. We therefore adjusted the Discussion, embedding the results in an interpretation considering networks (ll. 427-430, 432-435,438-442 ). 

      The authors analyze the EEG signal between 0.5 to 128 Hz, finding significant results in the correlation between single-trial BOLD and EEG activity in the higher gamma range (see Figure 6 panel C). It would be interesting to understand the rationale for including such high frequencies in the signal, and the interpretation of the significant correlation in the high gamma range. 

      On a technical level, we adapted our EEG processing pipeline from Hipp et al. (2011) who similarly investigated signals up to 128 Hz. Of note, the spectral smoothing was adjusted to match 3/4 octave, meaning that the frequency resolution at 128 Hz is rather broad and does not only contain oscillations at 128 Hz sharp. Gamma oscillations in general have repeatedly been reported in relation to pain and feedforward signals reflecting noxious information (e.g. Ploner et al., 2017; Strube et al., 2021). Strube et al. (2021) reported the highest effects of pain stimulus intensity and prediction error processing at high gamma frequencies (100 and 98 Hz, respectively). These findings could also serve as basis to interpret our results in this frequency range: If anticipatory activation in the ACC is linked to high gamma oscillations, which appear to play an important role in feedforward signaling of pain intensity and prediction errors, this could indicate that later processing of intensity in this area is already pre-modulated before the stimulus actually occurs. Of note: although not significant, it looks as if the cluster extends further into pain processing on a descriptive level. We added additional explanation regarding the interpretation of the correlation in the Discussion (ll. 414425):

      “The link between anticipatory activity in the ACC and EEG oscillatory activity was observed in the high gamma band, which is consistent with findings that demonstrate a connection between increased fMRI BOLD signals and a relative shift from lower to higher frequencies (Kilner et al., 2005). Gamma oscillations have been repeatedly reported in the context of pain and expectations and have been interpreted as reflecting feedforward signals of noxious information ( e.g. Ploner et al., 2017; Strube et al., 2021). In combination with our findings, this might imply that high frequency oscillations may not only signal higher actual or perceived pain intensity during pain processing (Nickel et al., 2022; Ploner et al., 2017; Strube et al., 2021; Tu et al., 2016), but might also be instrumental in the transfer of directed expectations from anticipation into pain processing.”

      Reviewer #2 (Public Review):  

      I think this is a very promising paper. The combination of EEG and fMRI is unique and original. However, I also have some suggestions that I think could help improve the manuscript. 

      This manuscript reports the findings of an EEG-fMRI study (n = 50) on the effects of expectations on pain. The combination of EEG with fMRI is extremely original and well-suited to study the transition from expectation to perception. However, I think that the current treatment of the data, as well as the way that the manuscript is currently written, does not fully capitalize on the potential of this unique dataset. Several findings are presented but there is currently no clear message coming out of this manuscript. 

      First, one positive point is that the experimental manipulation clearly worked. However, it should be noted that the instructions used are not typical of studies on placebo/nocebo. Participants were not told that the stimulations would be of higher/lower intensity. Rather, they were told that objective intensities were held constant, but that EEG recordings could be used to predict whether they would perceive the stimulus as more or less intense. I think that this is an interesting way to manipulate expectations, but there could have been more justification in the introduction for why the authors have chosen this unusual procedure. 

      Most importantly, we again want to emphasize again that participants were not aware that the stimulation temperature was always the same but were informed that they would receive different stimuli of medium intensity. We now clarify this in the revised Results (ll. 190-192) and Methods (ll. 612-614) sections.

      While we agree that our procedure was not typical, we do not think that the manipulation is not comparable to previous studies on pain-related expectations. To our knowledge, either expectations regarding a treatment that changes pain perception (treatment expectancy) or expectations regarding stimulus intensities (stimulus expectancy) are manipulated (see Atlas & Wager, 2014). In our study, participants received a cue that induced expectations in regard to a ”treatment”, although in this case the “treatment” came from changes in their own brain activity. This is comparable to studies using TENS-devices that are supposedly changing peripheral pain transmission (Skvortsova et al., 2020). Thus, although not typical, our paradigm could be classified as targeting treatment expectancies and allowed us to examine effects on a trial-by-trial level within subjects. We added a paragraph regarding the comparability of our paradigm with previous studies in the Discussion of the revised manuscript (ll. 452-464) .

      Also, the introduction mentions that little is known about potential cerebral differences between expectations of high vs. low pain expectations. I think the fear conditioning literature could be cited here. Activations in ACC, SMA, Ins, parahippocampal gyrus, PAG, etc. are often associated with upcoming threat, whereas activations vmPFC/default mode network are associated with safety. 

      We thank you for your suggestions to add literature on fear conditioning. We agree there is some overlap between fear conditioning and expectation effects in humans, but we also believe there are fundamental differences regarding their underlying processes and paradigms. E.g. the expectation effects are not driven by classical learning algorithms but act in a large amount as self-fulfilling prophecies (see e.g. Jepma et al., 2018). However, we now acknowledge the similarities e.g in the recruitment of the insula and the vmPFC of the modalities in our Introduction (ll. 132-136 ).

      The fact that the authors didn't observe a clearer distinction between high and low expectations here could be related to their specific instructions that imply that the stimulus is the same and that it is the subjective perception that is expected to change. In any case, this is a relatively minor issue that is easy to address. 

      We apologize again for the lack of clarity in our instructions: Participants were unaware that they would receive the exact same stimulus. The clear effects of the different conditions on expectation and pain ratings also challenge the notion that participants always expected the same level of stimulation and/or perception. Additionally, if participants were indeed expecting a consistent level of intensity in all conditions, one would also assume to see the same anticipatory activation in the control condition as in the placebo and nocebo conditions, which is not the case. Thus, we respectfully disagree that the common effects might be explained by our instructions but would argue that they indeed reflect common (anticipatory) processes of positive and negative expectations.

      Towards the end of the introduction, the authors present the aims of the study in mainly exploratory terms: 

      (1) What are the differences between anticipation and perception? 

      (2) What regions display a difference between high and low expectations (high > low or low < high) vs. an effect of expectation regardless of the direction (high and low different than neutral)? 

      I think these are good questions, but the authors should provide more justification, or framework, for these questions. More specifically, what will they be able to conclude based on their observations? 

      For instance (note that this is just an example to illustrate my point. I encourage the authors to come up with their own framework/predictions) : 

      (1) Possibility #1: A certain region encodes expectations in a directed fashion (high > low) and that same region also responds to perception in the same direction (high > low). This region would therefore modulate pain by assimilating perception towards expectations. 

      (2) Possibility # 2: different regions are involved in expectation and perception. Perhaps this could mean that certain regions influence pain processing through descending facilitation for instance...  

      Thank you for pointing out that our hypotheses were not crafted carefully enough. We tried to give better explanations for the possible interpretations of our hypotheses. Additionally, we interpreted our results on the background of a broader framework for placebo and nocebo effects (predictive coding) to derive possible functions of the described brain areas. We embedded this in our Introduction (ll. 74-86, 158-175 ) and Discussion (ll. 384-388 ), interpreting the anticipatory activity and the activity during pain processing in the context of expectation formation as described in Büchel et al. (2014).

      Interpretation derived from our framework (ll. 384-388):

      e.g.: “Following the framework of predictive coding, our results would suggest that the DPMS is the network responsible for integrating ascending signals with descending signals in the pain domain and that this process is similar for positive and negative valences during anticipation of pain but differentiates during pain processing.”

      Regarding analyses, I think that examining the transition from expectations to perception is a strong angle of the manuscript given the EGG-fMRI nature of the study. However, I feel that more could have been done here. One problem is that the sequence of analyses starts by identifying an fMRI signal of interest and then attempts to find its EEG correlates. The problem is that the low temporal resolution of fMRI makes it difficult to differentiate expectation from perception, which doesn't make this analysis a good starting point in my opinion. Why not start by identifying an EEG signal that differentiates perception vs expectation, and then look for its fMRI correlates?  

      We appreciate your feedback on the transition from expectations to perceptions and also think that additional questions could be answered with our data set. However, based on the literature we had specific hypotheses regarding specific brain areas, and we therefore decided to start from the fMRI data with the superior spatial resolution and EEG was used to focus on the temporal dynamics within the areas important for anticipatory processes. We share the view that many different approaches in analyzing our data are possible. On the other hand, identifying relevant areas based on EEG characteristics inherits even more uncertainty due to the spatial filtering of the EEG signal. For the research question of this study a more accurate evaluation of the involved areas and the related representation was more important. We therefore decided to only implement the procedure already present in the manuscript. 

      Finally, I found the hypotheses on "valenced" vs. "absolute" effects a little bit more difficult to follow. This is because "neutral" is not really neutral: it falls in between low and high. If I follow correctly, participants know that the temperature is always the same. Therefore, if they are told that the machine cannot predict whether their perception is going to be low or high, then it must be because it is likely to be in between. Ratings of expectation and pain ratings confirm that. The neutral condition is not "devoid" of expectations as the authors suggest.

      Therefore, it would make sense to look at regions with the following pattern low > neutral > high, or vice-versa, low < neutral < high. Low & high being different than neutral is more difficult to interpret. I don't think that you can say that it reflects "absolute" expectations because neutral is also the expectation of a medium temperature. Perhaps it reflects "certainty/uncertainty" or something like that, but it is not clear that it reflects "expectations". 

      Thank you for your valuable feedback! We considered your concerns about the interpretation of our results and completely agree that the control condition cannot be interpreted as void of expectations (ll. 119-123). We therefore evaluated the control condition in more detail in a separate analysis (ll. 219-232) and integrated a new assessment of the conditions into the Discussion (ll. 465-486). We changed the phrasing of our control condition to “neutral expectations”, as we agree that the control condition is not void of expectations and this phrasing is more in line with other studies (e.g. Colloca et al., 2010; Freeman et al., 2015; Schmid et al., 2015). We would argue that the neutral expectations can still be meaningfully compared to positive and negative expectations because only the latter shift expectations and perception in one direction. Thus, we changed our wording throughout the manuscript to acknowledge that we indeed did not test for general effects of expectations vs. no expectations, but for effects of directed expectations. Please also see our reasoning regarding the control condition in response to Reviewer 1, in which we addressed the interpretation of the control condition. We therefore still believe that the contrasts that we calculated between conditions are valid. The proposed new contrast largely overlaps with our differential contrast low>high and vice versa already reported in the manuscript (for additional results also see Supplements).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 6, panel C. The figure mentions Anterior Cingulate Cortex R, whereas the legend mentions left ACC. Please check. 

      Thanks for catching this, we changed the figure legend accordingly.

      Reviewer #2 (Recommendations For The Authors):  

      - I don't think that activity during the rating of expectations is easily interpretable. I think I would recommend not reporting it. 

      The majority of participants completed the expectation rating relatively quickly (M = 2.17 s, SD = 0.35 s), which resulted in the overlap between the DLPFC EEG cluster and the expectation rating encompassing only a limited portion of the cluster (~ 1 s). We agree that this activity still is more difficult to interpret, yet we have decided to report it for reasons of completeness.

      - The effects on SIIPS are interesting. I think that it is fine to present them as a "validation" of what was observed with pain ratings, but it also seems to give a direction to the analyses that the authors don't end up following. For instance, why not try other "signatures" like the NPS or signatures of pain anticipation? Also, why not try to look at EEG correlates of SIIPS? I don't think that the authors "need" to do any of that, but I just wanted to let them know that SIIPS results may stir that kind of curiosity in the readers.  

      While this would be indeed very interesting, these additional analyses are not directly related to our current research question. We fear that too many analyses could be confusing for the readers. Nonetheless, we are grateful for your suggestion and will implement additional brain signatures in future studies. 

      - The shock was calibrated to be 60%. Why not have high (70%) and low (30%) conditions at equal distances from neutral, like 80% and 40% for instance? The current design makes it hard to distinguish high from control. Perhaps the "common" effects of high + low are driven by a deactivation for low (30%)?  

      We appreciate your feedback! We adjusted the temperature during the test phase to counteract habituation typically happening with heat stimuli. We believe that this was a good measure as participants rated the control condition at roughly VAS 50 (M = 51.40) which was our target temperature and then would be equidistant to the VAS 70 and VAS 30 during conditioning when no habituation should have taken place yet. We further tested whether participants rated placebo and nocebo trials at equal distances from the control condition and found no existent bias for either of the conditions. To do this, we computed the individual placebo effect (control minus placebo) and nocebo effect (nocebo minus control) for each participant during the test phase and statistically compared whether they differed in terms of magnitude. There was no significant difference between placebo and nocebo effects for both expectation (placebo effect M = 14.25 vs. nocebo effect M = 17.22, t(49) = 1.92, p = .061) and pain ratings (placebo effect M = 6.52 vs. nocebo effect M = 5.40, t(49) = -1.11, p = .274). This suggests that our expectation manipulation resulted in comparable shifts in expectation and pain ratings away from the control condition for both the placebo and nocebo condition and thus hints against any bias of the conditioning temperatures. Please also note that the analysis of the common effects was masked for differences of the high and low, therefore the effects cannot be driven by one condition by itself.

      - If I understand correctly, all fMRI contrasts were thresholded with FWE. This is fine, but very strict. The authors could have opted for FDR. Maybe I missed something here....  

      While it is true that FDR is the more liberal approach, it is not valid for spatially correlated fMRI data and is no longer available in SPM for the correction of multiple comparisons. The newly implemented topological peak based FDR correction is comparably sensitive with the FWE correction (see. Chumbley et al. BELEG). We opted for the slightly more conservative approach in our preregistration (_p_FWE < .05), therefore a change of the correction is not possible.

      Altogether, I think that this is a great study. The combination of EEG and fMRI is truly unique and affords many opportunities to examine the transition from expectations to perception. The experimental manipulation of expectations seems to have worked well, and there seem to be very promising results. However, I think that more could have been done. At least, I would recommend trying to give more of a theoretical framework to help interpret the results.  

      We are very grateful for your positive feedback. We took your suggestion seriously and tried to implement a more general framework from the literature (see Büchel et al., 2014) to provide a better explanation for our results.

      References

      Atlas, L. Y., & Wager, T. D. (2014). A meta-analysis of brain mechanisms of placebo analgesia: Consistent findings and unanswered questions. Handbook of Experimental Pharmacology, 225, 37–69. https://doi.org/10.1007/978-3-662-44519-8_3

      Bingel, U., Wanigasekera, V., Wiech, K., Ni Mhuircheartaigh, R., Lee, M. C., Ploner, M., & Tracey, I. (2011). The effect of treatment expectation on drug efficacy: Imaging the analgesic benefit of the opioid remifentanil. Science Translational Medicine, 3(70), 70ra14. https://doi.org/10.1126/scitranslmed.3001244

      Büchel, C., Geuter, S., Sprenger, C., & Eippert, F. (2014). Placebo analgesia: A predictive coding perspective. Neuron, 81(6), 1223–1239. https://doi.org/10.1016/j.neuron.2014.02.042

      Colloca, L., Petrovic, P., Wager, T. D., Ingvar, M., & Benedetti, F. (2010). How the number of learning trials affects placebo and nocebo responses. Pain, 151(2), 430–439. https://doi.org/10.1016/j.pain.2010.08.007

      Freeman, S., Yu, R., Egorova, N., Chen, X., Kirsch, I., Claggett, B., Kaptchuk, T. J., Gollub, R. L., & Kong, J. (2015). Distinct neural representations of placebo and nocebo effects. NeuroImage, 112, 197–207. https://doi.org/10.1016/j.neuroimage.2015.03.015

      Hipp, J. F., Engel, A. K., & Siegel, M. (2011). Oscillatory synchronization in large-scale cortical networks predicts perception. Neuron, 69(2), 387–396. https://doi.org/10.1016/j.neuron.2010.12.027

      Jepma, M., Koban, L., van Doorn, J., Jones, M., & Wager, T. D. (2018). Behavioural and neural evidence for self-reinforcing expectancy effects on pain. Nature Human Behaviour, 2(11), 838–855. https://doi.org/10.1038/s41562-018-0455-8

      Kilner, J. M., Mattout, J., Henson, R., & Friston, K. J. (2005). Hemodynamic correlates of EEG: A heuristic. NeuroImage, 28(1), 280–286. https://doi.org/10.1016/j.neuroimage.2005.06.008

      Nickel, M. M., Tiemann, L., Hohn, V. D., May, E. S., Gil Ávila, C., Eippert, F., & Ploner, M. (2022). Temporal-spectral signaling of sensory information and expectations in the cerebral processing of pain. Proceedings of the National Academy of Sciences of the United States of America, 119(1). https://doi.org/10.1073/pnas.2116616119

      Ploner, M., Sorg, C., & Gross, J. (2017). Brain Rhythms of Pain. Trends in Cognitive Sciences, 21(2), 100–110. https://doi.org/10.1016/j.tics.2016.12.001

      Schmid, J., Bingel, U., Ritter, C., Benson, S., Schedlowski, M., Gramsch, C., Forsting, M., & Elsenbruch, S. (2015). Neural underpinnings of nocebo hyperalgesia in visceral pain: A fMRI study in healthy volunteers. NeuroImage, 120, 114–122. https://doi.org/10.1016/j.neuroimage.2015.06.060

      Shih, Y.‑W., Tsai, H.‑Y., Lin, F.‑S., Lin, Y.‑H., Chiang, C.‑Y., Lu, Z.‑L., & Tseng, M.‑T. (2019). Effects of Positive and Negative Expectations on Human Pain Perception Engage Separate But Interrelated and Dependently Regulated Cerebral Mechanisms. Journal of Neuroscience, 39(7), 1261–1274. https://doi.org/10.1523/JNEUROSCI.2154-18.2018

      Skvortsova, A., Veldhuijzen, D. S., van Middendorp, H., Colloca, L., & Evers, A. W. M. (2020). Effects of Oxytocin on Placebo and Nocebo Effects in a Pain Conditioning Paradigm: A Randomized Controlled Trial. The Journal of Pain, 21(3-4), 430–439. https://doi.org/10.1016/j.jpain.2019.08.010

      Strube, A., Rose, M., Fazeli, S., & Büchel, C. (2021). The temporal and spectral characteristics of expectations and prediction errors in pain and thermoception. ELife, 10. https://doi.org/10.7554/eLife.62809

      Tu, Y., Zhang, Z., Tan, A., Peng, W., Hung, Y. S., Moayedi, M., Iannetti, G. D., & Hu, L. (2016). Alpha and gamma oscillation amplitudes synergistically predict the perception of forthcoming nociceptive stimuli. Human Brain Mapping, 37(2), 501–514. https://doi.org/10.1002/hbm.23048

    1. eLife Assessment

      This valuable study provides convincing evidence that white matter diffusion imaging of the right superior longitudinal fasciculus might help to develop a predictive biomarker of chronic back pain chronicity. The results are based on a discovery-replication approach with different cohorts, but the sample size is limited. The findings will interest researchers interested in the brain mechanisms of chronic pain and in developing brain-based biomarkers of chronic pain.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      We thank the reviewer for emphasizing the strength of our paper and the importance of validation on multiple unseen cohorts.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      We agree with the reviewer that we might have under-estimated the prognostic accuracy of questionnaire-based tools, especially, the strong predictive accuracy shown by Tangay-Sabourin 2023.  In this revised version, we have changed both the introduction and the discussion to reflect the questionnaire-based prognostic accuracy reported in the seminal work by Tangay-Sabourin. 

      In the introduction (page 4, lines 3-18), we now write:

      “Some studies have addressed this question with prognostic models incorporating demographic, pain-related, and psychosocial predictors.1-4 While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain, their prognostic accuracy is limited,5 with parameters often explaining no more than 30% of the variance.6-8. A recent notable study in this regard developed a model based on easy-to-use brief questionnaires to predict the development and spread of chronic pain in a variety of pain conditions capitalizing on a large dataset obtained from the UK-BioBank. 9 This work demonstrated that only few features related to assessment of sleep, neuroticism, mood, stress, and body mass index were enough to predict persistence and spread of pain with an area under the curve of 0.53-0.73. Yet, this study is unique in showing such a predictive value of questionnaire-based tools. Neurobiological measures could therefore complement existing prognostic models based on psychosocial variables to improve overall accuracy and discriminative power. More importantly, neurobiological factors such as brain parameters can provide a mechanistic understanding of chronicity and its central processing.”

      And in the conclusion (page 22, lines 5-9), we write:

      “Integrating findings from studies that used questionnaire-based tools and showed remarkable predictive power9 with neurobiological measures that can offer mechanistic insights into chronic pain development, could enhance predictive power in CBP prognostic modeling.”

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      The reviewer raises a very important point of limited sample size and of the methodology intrinsic of model development and testing. We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      The reviewer is correct, the model performance is fair which limits its usefulness for clinical translation.  We wanted to emphasize that obtaining diffusion images can be done in a short period of time and, hence, as such models’ predictive accuracy improves, clinical translation becomes closer to reality. In addition, our findings are based on older diffusion data and limited sample sizes coming from different sites and different acquisition sequences.  This by itself would limit the accuracy especially since the evidence shows that sample size affects also model performance (i.e. testing AUC)10.  In the revision, we re-worded the sentence mentioned by the reviewer to reflect the points discussed here. This also motivates us to collect a more homogeneous and larger sample.  In the limitations section of the discussion, we now write (page 21, lines 6-9):

      “Even though our model performance is fair, which currently limits its usefulness for clinical translation, we believe that future models would further improve accuracy by using larger homogenous sample sizes and uniform acquisition sequences.”

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      We thank the reviewer for acknowledging that our effort and approach were useful.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms). 

      We apologize for the lack of clarity; we did run tractography and we did not use a pre-determined streamlined form of the connectome.

      Finding which connections are important for the classification of SBPr and SBPp is difficult because of our choices during data preprocessing and SVC model development: (1) preprocessing steps which included TNPCA for dimensionality reduction, and regressing out the confounders (i.e., age, sex, and head motion); (2) the harmonization for effects of sites; and (3) the Support Vector Classifier which is a hard classification model11.

      In the methods section (page 30, lines 21-23) we added: “Of note, such models cannot tell us the features that are important in classifying the groups.  Hence, our model is considered a black-box predictive model like neural networks.”

      Minor:

      What results are shown in Figure 7? It looks more descriptive than the actual results.

      The reviewer is correct; Figure 7 and Supplementary Figure 4 were both qualitatively illustrating the shape of the SLF. We have now changed both figures in response to this point and a point raised by reviewer 3.  We now show a 3D depiction of different sub-components of the right SLF (Figure 7) and left SLF (Now Supplementary Figure 11 instead of Supplementary Figure 4) with a quantitative estimation of the FA content of the tracts, and the number of tracts per component.  The results reinforce the TBSS analysis in showing asymmetry in the differences between left and right SLF between the groups (i.e. SBPp and SBPr) in both FA values and number of tracts per bundle.

      Reviewer #2 (Public Review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.

      Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

      We thank reviewer 2 for pointing to the strength of our study.

      The following revisions might help to improve the manuscript further.

      - Definition of recovery. In the New Haven and Chicago datasets, SBPr and SBPp patients are distinguished by reductions of >30% in pain intensity. In contrast, in the Mannheim dataset, both groups are distinguished by reductions of >20%. This should be harmonized. Moreover, as there is no established definition of recovery (reference 79 does not provide a clear criterion), it would be interesting to know whether the results hold for different definitions of recovery. Control analyses for different thresholds could strengthen the robustness of the findings.

      The reviewer raises an important point regarding the definition of recovery.  To address the reviewers’ concern we have added a supplementary figure (Fig. S6) showing the results in the Mannheim data set if a 30% reduction is used as a recovery criterion, and in the manuscript (page 11, lines 1,2) we write: “Supplementary Figure S6 shows the results in the Mannheim data set if a 30% reduction is used as a recovery criterion in this dataset (AUC= 0.53)”.

      We would like to emphasize here several points that support the use of different recovery thresholds between New Haven and Mannheim.  The New Haven primary pain ratings relied on visual analogue scale (VAS) while the Mannheim data relied on the German version of the West-Haven-Yale Multidimensional Pain Inventory. In addition, the Mannheim data were pre-registered with a definition of recovery at 20% and are part of a larger sub-acute to chronic pain study with prior publications from this cohort using the 20% cut-off12. Finally, a more recent consensus publication13 from IMMPACT indicates that a change of at least 30% is needed for a moderate improvement in pain on the 0-10 Numerical Rating Scale but that this percentage depends on baseline pain levels.

      - Analysis of the Chicago dataset. The manuscript includes results on FA values and their association with pain severity for the New Haven and Mannheim datasets but not for the Chicago dataset. It would be straightforward to show figures like Figures 1 - 4 for the Chicago dataset, as well.

      We welcome the reviewer’s suggestion; we added these analyses to the results section of the resubmitted manuscript (page 11, lines 13-16): “The correlation between FA values in the right SLF and pain severity in the Chicago data set showed marginal significance (p = 0.055) at visit 1 (Fig. S8A) and higher FA values were significantly associated with a greater reduction in pain at visit 2 (p = 0.035) (Fig. S8B).”

      - Data sharing. The discovery-replication approach of the present study distinguishes the present from previous approaches. This approach enhances the belief in the robustness of the findings. This belief would be further enhanced by making the data openly available. It would be extremely valuable for the community if other researchers could reproduce and replicate the findings without restrictions. It is not clear why the fact that the studies are ongoing prevents the unrestricted sharing of the data used in the present study.

      We greatly appreciate the reviewer's suggestion to share our data sets, as we strongly support the Open Science initiative. The Chicago data set is already publicly available. The New Haven data set will be shared on the Open Pain repository, and the Mannheim data set will be uploaded to heiDATA or heiARCHIVE at Heidelberg University in the near future. We cannot share the data immediately because this project is part of the Heidelberg pain consortium, “SFB 1158: From nociception to chronic pain: Structure-function properties of neural pathways and their reorganization.” Within this consortium, all data must be shared following a harmonized structure across projects, and no study will be published openly until all projects have completed initial analysis and quality control.

      Reviewer #3 (Public Review):

      Summary:

      Authors suggest a new biomarker of chronic back pain with the option to predict the result of treatment. The authors found a significant difference in a fractional anisotropy measure in superior longitudinal fasciculus for recovered patients with chronic back pain.

      Strengths:

      The results were reproduced in three different groups at different studies/sites.

      Weaknesses:

      - The number of participants is still low.

      The reviewer raises a very important point of limited sample size. As discussed in our replies to reviewer number 1:

      We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      - An explanation of microstructure changes was not given.

      The reviewer points to an important gap in our discussion.  While we cannot do a direct study of actual tissue microstructure, we explored further the changes observed in the SLF by calculating diffusivity measures. We have now performed the analysis of mean, axial, and radial diffusivity. 

      In the results section we added (page 7, lines 12-19): “We also examined mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) extracted from the right SLF shown in Fig.1 to further understand which diffusion component is different between the groups. The right SLF MD is significantly increased (p < 0.05) in the SBPr compared to SBPp patients (Fig. S3), while the right SLF RD is significantly decreased (p < 0.05) in the SBPr compared to SBPp patients in the New Haven data (Fig. S4). Axial diffusivity extracted from the RSLF mask did not show significant difference between SBPr and SBPp (p = 0.28) (Fig. S5).”

      In the discussion, we write (page 15, lines 10-20):

      “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts,15 our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      - Some technical drawbacks are presented.

      We are uncertain if the reviewer is suggesting that we have acknowledged certain technical drawbacks and expects further elaboration on our part. We kindly request that the reviewer specify what particular issues need to be addressed so that we can respond appropriately.

      Recommendations For The Authors:

      We thank the reviewers for their constructive feedback, which has significantly improved our manuscript. We have done our best to answer the criticisms that they raised point-by-point.

      Reviewer #2 (Recommendations For The Authors):

      The discovery-replication approach of the current study justifies the use of the terminus 'robust.' In contrast, previous studies on predictive biomarkers using functional and structural brain imaging did not pursue similar approaches and have not been replicated. Still, the respective biomarkers are repeatedly referred to as 'robust.' Throughout the manuscript, it would, therefore, be more appropriate to remove the label 'robust' from those studies.

      We thank the reviewer for this valuable suggestion. We removed the label 'robust' throughout the manuscript when referring to the previous studies which didn’t follow the same approach and have not yet been replicated.

      Reviewer #3 (Recommendations For The Authors):

      This is, indeed, quite a well-written manuscript with very interesting findings and patient group. There are a few comments that enfeeble the findings.

      (1) It is a bit frustrating to read at the beginning how important chronic back pain is and the number of patients in the used studies. At least the number of healthy subjects could be higher.

      The reviewer raises an important point regarding the number of pain-free healthy controls (HC) in our samples. We first note that our primary statistical analysis focused on comparing recovered and persistent patients at baseline and validating these findings across sites without directly comparing them to HCs. Nevertheless, the data from New Haven included 28 HCs at baseline, and the data from Mannheim included 24 HCs. Although these sample sizes are not large, they have enabled us to clearly establish that the recovered SBPr patients generally have larger FA values in the right superior longitudinal fasciculus compared to the HCs, a finding consistent across sites (see Figs. 1 and 3). This suggests that the general pain-free population includes individuals with both low and high-risk potential for chronic pain. It also offers one explanation for the reported lack of differences or inconsistent differences between chronic low-back pain patients and HCs in the literature, as these differences likely depend on the (unknown) proportion of high- and low-risk individuals in the control groups. Therefore, if the high-risk group is more represented by chance in the HC group, comparisons between HCs and chronic pain patients are unlikely to yield statistically significant results. Thus, while we agree with the reviewer that the sample sizes of our HCs are limited, this limitation does not undermine the validity of our findings.

      (2) Pain reaction in the brain is in general a quite popular topic and could be connected to the findings or mentioned in the introduction.

      We thank the reviewer for this suggestion.  We have now added a summary of brain response to pain in general; In the introduction, we now write (page 4, lines 19-22 and page 5, lines 1-5):

      “Neuroimaging research on chronic pain has uncovered a shift in brain responses to pain when acute and chronic pain are compared. The thalamus, primary somatosensory, motor areas, insula, and mid-cingulate cortex most often respond to acute pain and can predict the perception of acute pain16-19. Conversely, limbic brain areas are more frequently engaged when patients report the intensity of their clinical pain20, 21. Consistent findings have demonstrated that increased prefrontal-limbic functional connectivity during episodes of heightened subacute ongoing back pain or during a reward learning task is a significant predictor of CBP.12, 22. Furthermore, low somatosensory cortex excitability in the acute stage of low back pain was identified as a predictor of CBP chronicity.23”

      (3) It is clearly observed structural asymmetry in the brain, why not elaborate this finding further? Would SLF be a hub in connectivity analysis? Would FA changes have along tract features? etc etc etc

      The reviewer raises an important point. There is ground to suggest from our data that there is an asymmetry to the role of the SLF in resilience to chronic pain. We discuss this at length in the Discussion section. We have, in addition, we elaborated more in our data analysis using our Population Based Structural Connectome pipeline on the New Haven dataset. Following that approach, we studied both the number of fiber tracts making different parts of the SLF on the right and left side. In addition, we have extracted FA values along fiber tracts and compared the average across groups. Our new analyses are presented in our modified Figures 7 and Fig S11.  These results support the asymmetry hypothesis indeed. The SLF could be a hub of structural connectivity. Please note however, given the nature of our design of discovery and validation, the study of structural connectivity of the SLF is beyond the scope of this paper because tract-based connectivity is very sensitive to data collection parameters and is less accurate with single shell DWI acquisition. Therefore, we will pursue the study of connectivity of the SLF in the future with well-powered and more harmonized data.

      (4) Only FA is mentioned; did the authors work with MD, RD, and AD metrics?

      We thank the reviewer for this suggestion that helps in providing a clearer picture of the differences in the right SLF between SBPr and SBPp. We have now extracted MD, AD, and RD for the predictive mask we discovered in Figure 1 and plotted the values comparing SBPr to SBPp patients in Fig. S3, Fig. S4., and Fig. S5 across all sites using one comprehensive harmonized analysis. We have added in the discussion “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts15, our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      (5) There are many speculations in the Discussion, however, some of them are not supported by the results.

      We agree with the reviewer and thank them for pointing this out. We have now made several changes across the discussion related to the wording where speculations were not supported by the data. For example, instead of writing (page 16, lines 7-9): “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain is a top-down phenomenon related to visuospatial and body awareness.”, We write: “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain might be related to a top-down phenomenon involving visuospatial and body awareness.”

      (6) A method section was written quite roughly. In order to obtain all the details for a potential replication one needs to jump over the text.

      The reviewer is correct; our methodology may have lacked more detailed descriptions.  Therefore, we have clarified our methodology more extensively.  Under “Estimation of structural connectivity”; we now write (page 28, lines 20,21 and page 29, lines 1-19):

      “Structural connectivity was estimated from the diffusion tensor data using a population-based structural connectome (PSC) detailed in a previous publication.24 PSC can utilize the geometric information of streamlines, including shape, size, and location for a better parcellation-based connectome analysis. It, therefore, preserves the geometric information, which is crucial for quantifying brain connectivity and understanding variation across subjects. We have previously shown that the PSC pipeline is robust and reproducible across large data sets.24 PSC output uses the Desikan-Killiany atlas (DKA) 25 of cortical and sub-cortical regions of interest (ROI). The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S6.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      (7) Why not join all the data with harmonisation in order to reproduce the results (TBSS)

      We have followed the reviewer’s suggestion; we used neuroCombat harmonization after pooling all the diffusion weighted data into one TBSS analysis. Our results remain the same after harmonization. 

      In the Supplementary Information we added a paragraph explaining the method for harmonization; we write (SI, page 3, lines 25-34):

      “Harmonization of DTI data using neuroCombat. Because the 3 data sets originated from different sites using different MR data acquisition parameters and slightly different recruitment criteria, we applied neuroCombat 29  to correct for site effects and then repeated the TBSS analysis shown in Figure 1 and the validation analyses shown in Figures 5 and 6. First, the FA maps derived using the FDT toolbox were pooled into one TBSS analysis where registration to a standard template FA template (FMRIB58_FA_1mm.nii.gz part of FSL) was performed.  Next, neuroCombat was applied to the FA maps as implemented in Python with batch (i.e., site) effect modeled with a vector containing 1 for New Haven, 2 for Chicago, and 3 for Mannheim originating maps, respectively. The harmonized maps were then skeletonized to allow for TBSS.”

      And in the results section, we write (page 12, lines 2-21):

      “Validation after harmonization

      Because the DTI data sets originated from 3 sites with different MR acquisition parameters, we repeated our TBSS and validation analyses after correcting for variability arising from site differences using DTI data harmonization as implemented in neuroCombat. 29 The method of harmonization is described in detail in the Supplementary Methods. The whole brain unpaired t-test depicted in Figure 1 was repeated after neuroCombat and yielded very similar results (Fig. S9A) showing significantly increased FA in the SBPr compared to SBPp patients in the right superior longitudinal fasciculus (MNI-coordinates of peak voxel: x = 40; y = - 42; z = 18 mm; t(max) = 2.52; p < 0.05, corrected against 10,000 permutations).  We again tested the accuracy of local diffusion properties (FA) of the right SLF extracted from the mask of voxels passing threshold in the New Haven data (Fig.S9A) in classifying the Mannheim and the Chicago patients, respectively, into persistent and recovered. FA values corrected for age, gender, and head displacement accurately classified SBPr  and SBPp patients from the Mannheim data set with an AUC = 0.67 (p = 0.023, tested against 10,000 random permutations, Fig. S9B and S7D), and patients from the Chicago data set with an AUC = 0.69 (p = 0.0068) (Fig. S9C and S7E) at baseline, and an AUC = 0.67 (p = 0.0098)  (Fig. S9D and S7F) patients at follow-up,  confirming the predictive cluster from the right SLF across sites. The application of neuroCombat significantly changes the FA values as shown in Fig.S10 but does not change the results between groups.”

      Minor comments

      (1) In the case of New Haven data, one used MB 4 and GRAPPA 2, these two factors accelerate the imaging 8 times and often lead to quite a poor quality.<br /> Any kind of QA?

      We thank the reviewer for identifying this error. GRAPPA 2 was in fact used for our T1-MPRAGE image acquisition but not during the diffusion data acquisition. The diffusion data were acquired with a multi-band acceleration factor of 4.  We have now corrected this mistake.

      (2) Why not include MPRAGE data into the analysis, in particular, for predictions?

      We thank the reviewer for the suggestion. The collaboration on this paper was set around diffusion data. In addition, MPRAGE data from New Haven related to prediction is already published (10.1073/pnas.1918682117) and MPRAGE data of the Mannheim data set is a part of the larger project and will be published elsewhere.

      (3) In preprocessing, the authors wrote: "Eddy current corrects for image distortions due to susceptibility-induced distortions and eddy currents in the gradient coil"<br /> However, they did not mention that they acquired phase-opposite b0 data. It means eddy_openmp works likely only as an alignment tool, but not susceptibility corrector.

      We kindly thank the reviewer for bringing this to our attention. We indeed did not collect b0 data in the phase-opposite direction, however, eddy_openmp can still be used to correct for eddy current distortions and perform motion correction, but the absence of phase-opposite b0 data may limit its ability to fully address susceptibility artifacts. This is now noted in the Supplementary Methods under Preprocessing section (SI, page 3, lines 16-18): “We do note, however, that as we did not acquire data in the phase-opposite direction, the susceptibility-induced distortions may not be fully corrected.”

      (4) Version of FSL?

      We thank the reviewer for addressing this point that we have now added under the Supplementary Methods (SI, page 3, lines 10-11): “Preprocessing of all data sets was performed employing the same procedures and the FMRIB diffusion toolbox (FDT) running on FSL version 6.0.”

      (5) Some short sketches about the connectivity analysis could be useful, at least in SI.

      We are grateful for this suggestion that improves our work. We added the sketches about the connectivity analysis, please see Figure 7 and Supplementary Figure 11.

      (6) Machine learning: functions, language, version?

      We thank the reviewer for pointing out these minor points that we now hope to have addressed in our resubmission in the Methods section by adding a detailed description of the structural connectivity analysis. We added: “The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S7.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      The script is described and provided at: https://github.com/MISICMINA/DTI-Study-Resilience-to-CBP.git.

      (7) Ethical approval?

      The New Haven data is part of a study that was approved by the Yale University Institutional Review Board. This is mentioned under the description of the data “New Haven (Discovery) data set (page 23, lines 1,2).  Likewise, the Mannheim data is part of a study approved by Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form. This is also mentioned under “Mannheim data set” (page 26, lines 2-5): “The study was approved by the Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form.”

      (1) Traeger AC, Henschke N, Hubscher M, et al. Estimating the Risk of Chronic Pain: Development and Validation of a Prognostic Model (PICKUP) for Patients with Acute Low Back Pain. PLoS Med 2016;13:e1002019.

      (2) Hill JC, Dunn KM, Lewis M, et al. A primary care back pain screening tool: identifying patient subgroups for initial treatment. Arthritis Rheum 2008;59:632-641.

      (3) Hockings RL, McAuley JH, Maher CG. A systematic review of the predictive ability of the Orebro Musculoskeletal Pain Questionnaire. Spine (Phila Pa 1976) 2008;33:E494-500.

      (4) Chou R, Shekelle P. Will this patient develop persistent disabling low back pain? JAMA 2010;303:1295-1302.

      (5) Silva FG, Costa LO, Hancock MJ, Palomo GA, Costa LC, da Silva T. No prognostic model for people with recent-onset low back pain has yet been demonstrated to be suitable for use in clinical practice: a systematic review. J Physiother 2022;68:99-109.

      (6) Kent PM, Keating JL. Can we predict poor recovery from recent-onset nonspecific low back pain? A systematic review. Man Ther 2008;13:12-28.

      (7) Hruschak V, Cochran G. Psychosocial predictors in the transition from acute to chronic pain: a systematic review. Psychol Health Med 2018;23:1151-1167.

      (8) Hartvigsen J, Hancock MJ, Kongsted A, et al. What low back pain is and why we need to pay attention. Lancet 2018;391:2356-2367.

      (9) Tanguay-Sabourin C, Fillingim M, Guglietti GV, et al. A prognostic risk score for development and spread of chronic pain. Nat Med 2023;29:1821-1831.

      (10) Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4-E7.

      (11) Liu Y, Zhang HH, Wu Y. Hard or Soft Classification? Large-margin Unified Machines. J Am Stat Assoc 2011;106:166-177.

      (12) Loffler M, Levine SM, Usai K, et al. Corticostriatal circuits in the transition to chronic back pain: The predictive role of reward learning. Cell Rep Med 2022;3:100677.

      (13) Smith SM, Dworkin RH, Turk DC, et al. Interpretation of chronic pain clinical trial outcomes: IMMPACT recommended considerations. Pain 2020;161:2446-2461.

      (14) Lieberman G, Shpaner M, Watts R, et al. White Matter Involvement in Chronic Musculoskeletal Pain. The Journal of Pain 2014;15:1110-1119.

      (15) Mansour AR, Baliki MN, Huang L, et al. Brain white matter structural properties predict transition to chronic pain. Pain 2013;154:2160-2168.

      (16) Wager TD, Atlas LY, Lindquist MA, Roy M, Woo CW, Kross E. An fMRI-based neurologic signature of physical pain. N Engl J Med 2013;368:1388-1397.

      (17) Lee JJ, Kim HJ, Ceko M, et al. A neuroimaging biomarker for sustained experimental and clinical pain. Nat Med 2021;27:174-182.

      (18) Becker S, Navratilova E, Nees F, Van Damme S. Emotional and Motivational Pain Processing: Current State of Knowledge and Perspectives in Translational Research. Pain Res Manag 2018;2018:5457870.

      (19) Spisak T, Kincses B, Schlitt F, et al. Pain-free resting-state functional brain connectivity predicts individual pain sensitivity. Nat Commun 2020;11:187.

      (20) Baliki MN, Apkarian AV. Nociception, Pain, Negative Moods, and Behavior Selection. Neuron 2015;87:474-491.

      (21) Elman I, Borsook D. Common Brain Mechanisms of Chronic Pain and Addiction. Neuron 2016;89:11-36.

      (22) Baliki MN, Petre B, Torbey S, et al. Corticostriatal functional connectivity predicts transition to chronic back pain. Nat Neurosci 2012;15:1117-1119.

      (23) Jenkins LC, Chang WJ, Buscemi V, et al. Do sensorimotor cortex activity, an individual's capacity for neuroplasticity, and psychological features during an episode of acute low back pain predict outcome at 6 months: a protocol for an Australian, multisite prospective, longitudinal cohort study. BMJ Open 2019;9:e029027.

      (24) Zhang Z, Descoteaux M, Zhang J, et al. Mapping population-based structural connectomes. Neuroimage 2018;172:130-145.

      (25) Desikan RS, Segonne F, Fischl B, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 2006;31:968-980.

      (26) Maier-Hein KH, Neher PF, Houde J-C, et al. The challenge of mapping the human connectome based on diffusion tractography. Nature Communications 2017;8:1349.

      (27) Chiang MC, McMahon KL, de Zubicaray GI, et al. Genetics of white matter development: a DTI study of 705 twins and their siblings aged 12 to 29. Neuroimage 2011;54:2308-2317.

      (28) Zhao B, Li T, Yang Y, et al. Common genetic variation influencing human white matter microstructure. Science 2021;372.

      (29) Fortin JP, Parker D, Tunc B, et al. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 2017;161:149-170.

    3. Reviewer #1 (Public review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      Weaknesses:

      The main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of time until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      Even if the performance was properly assessed their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min 18 of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      Overall, these criticisms are more about the wording sometimes use and the inference they made. I still think this is a very relevant contribution to the field. Showing predictive performance through cross validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

    4. Reviewer #2 (Public review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.<br /> Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

    5. Reviewer #3 (Public review):

      Summary:

      The authors suggest a new biomarker of chronic back pain with an option to predict a result of treatment.

      Strengths:

      The results were reproduced in three studies.

      Weaknesses:

      The number of participants is still low, an explanation of microstructure changes was not given, and some technical drawbacks are presented.

    1. eLife Assessment

      By combining psychophysics and computational modelling based on the Theory of Visual Attention, this study examines the mechanisms underlying self-prioritization by revealing the influence of self-associations on early attentional selection. While the findings are important, the experimental evidence is incomplete. The relationship between consciousness (awareness) and attention, the potential contamination by arousal, the inconsistent and unexpected results, and the distinguishing between social and perceptual tasks need to be addressed or improved. The work will be of interest to researchers in psychology, cognitive science, and neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

    3. Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      We found the possibility of other mechanisms to be responsible for “prior entry” interesting too, but believe there are solid grounds for the hypothesis that it is indicative of attention:

      First, prior entry has a long-standing history as in index of attention (e.g., Titchener, 1903; Shore et al., 2001; Yates and Nicholls, 2009; Olivers et al. 2011; see Spence & Parise, 2010, for a review.) Of course, other factors (like the ones mentioned) can contribute to encoding speed. However, for the perceptual condition, we systematically varied a stimulus feature that is associated with selective attention (salience, see e.g. Wolfe, 2021) and kept other features that are known to be associated with other factors such as arousal and sensitivity constant across the two variants (e.g. clear over threshold visibility) or varied them between participants (e.g. the colours / shapes used).

      Second, in the social salience condition we used a manipulation that has repeatedly been used to establish social salience effects in other paradigms (e.g., Li et al., 2022; Liu & Sui, 2016; Scheller et al., 2024; Sui et al., 2015; see Humphreys & Sui, 2016, for a review). We assume that the reviewer’s comment suggests that changes in arousal or sensitivity may be responsible for social salience effects, specifically. We have several reasons to interpret the social salience effects as an alteration in attentional selection, rather than a result of arousal or sensitivity:

      Arousal and attention are closely linked. However, within the present model, arousal is more likely linked to the availability of processing resources (capacity parameter C). That is, enhanced arousal is typically not stimulus-specific, and therefore unlikely affects the *relative* advantage in processing weights/rates of the self-associated (vs other-associated) stimuli. Indeed, a recent study showed that arousal does not modulate the relative division of attentional resources (as modelled by the Theory of Visual Attention; Asgeirsson & Nieuwenhuis, 2017). As such, it is unlikely that arousal can explain the observed results in relative processing changes for the self and other identities.

      Further, there is little reason to assume that presenting a different shape enhances perceptual sensitivity. Firstly, all stimuli were presented well above threshold, which would shrink any effects that were resulting from increases in sensitivity alone. Secondly, shape-associations were counterbalanced across participants, reducing the possibility that specific features, present in the stimulus display, lead to the measurable change in processing rates as a result of enhanced shape-sensitivity.

      Taken together, both, the wealth of literature that suggests prior entry to index attention and the specific design choices within our study, strongly support the notion that the observed changes in processing rates are indicative of changes in attentional selection, rather than other mechanisms (e.g. arousal, sensitivity).

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

      We appreciate this thought-provoking comment. We conceptualize attention as a process that can facilitate different levels of representation, rather than as separate systems tuned to specific types of information. Different forms of representation, such as the perceptual shape, or the associated social identity, may be impacted by the same attentional process at different levels of representation. Indeed, our findings suggest that both social and perceptual salience effects may result from the same attentional system, albeit at different levels of representation. This is further supported by the additivity of perceptual and social salience effects and the negative correlation of processing facilitations between perceptually and socially salient cues. These results may reflect a trade-off in how attentional resources are distributed between either perceptually or socially salient stimuli.

      Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      Thank you for outlining the specific points that raise your concern. We were happy to address these points as follows:

      a. Replications and Consistency: In our study, we consistently observed trends (relative reduction in processing speed of the self-associated stimulus) in the social salience conditions across experiments. While Experiment 2 demonstrated a significant reduction in processing rate towards self-stimuli, there was a notable trend in Experiment 1 as well.

      b. Condition-specific parameters: The condition-specific C parameters, though presenting a small effect size, significantly improved model fit. Inspecting the HDI ranges of our estimated C parameters indicates a high probability (85-89%) that processing capacity increased due to social associations, suggesting that even small changes (~2Hz) can hold meaningful implications within the context attentional selection.

      Please also note that the main conclusions about relative salience (self/other, salient/non-salient) are based on the relative processing rates. Processing rates are the product of the processing capacity (condition- but not stimulus dependent) and the attentional weight (condition and stimulus dependent). The latter is crucial to judge the *relative* advantage of the salient stimulus. Hence, the self-/salient stimulus advantage that is reflected in the ‘processing rate difference’ is automatically also reflected in the relative attentional weights attributed to the self/other and salient/non-salient stimuli. As such, the overall results of an automatic relative advantage of self-associated stimuli hold, independently of the change in overall processing capacity.

      c. Correlations: Regarding the correlations the reviewer noted, we wish to clarify that these were exploratory, and not the primary focus of our research. The aim of these exploratory analyses was to gauge the contribution of attentional selection to matching-based SPEs. As SPEs measured via the matching task are typically based on multiple different levels of processing, the contribution of early attentional selection to their overall magnitude was unclear. Without being able to gauge the possible effect sizes, corrected analyses may prevent detecting small but meaningful effects. As such, the effect sizes reported serve future studies to estimate power a priori and conduct well-powered replications of such exploratory effects. Additionally, Bayes factors were provided to give an appreciation of the strength of the evidence, all suggesting at least moderate evidence in favour of a correlation. Lastly, please note that effects that were measured within individuals and task (processing rate increase in social and perceptual decision dimensions in the TOJ task) showed consistent patterns, suggesting that the modulations within tasks were highly predictive of each other, while the modulations between tasks were not as clearly linked. We will add this clarification to the revised manuscript.

      d. Unexpected results: The unexpected results concerning the processing rates of other-associated versus self-associated stimuli certainly warrant further discussion. We believe that the additional processing steps required for social judgments, reflected in enhanced reaction times, may explain the slower processing of self-associated stimuli in that dimension. We agree that not all findings will align with initial hypotheses, and this variability presents avenues for further research. We have added this to the discussion of social salience effects.

      e. Whether association strength can account for the findings: We appreciate the scepticism regarding the strength of self-association with shapes. However, our within-participant design and control matching task indicate that the relative processing advantage for self-associated stimuli holds across conditions. This makes the scenario that “the self-association with shapes was not strong enough to demonstrate attention bias” very unlikely. Firstly, the relative processing advantage of self-associated stimuli in the perceptual decision condition, and the absence of such advantage in the social decision condition, were evidenced in the same participants. Hence, the strength of association between shapes and social identities was the same for both conditions. However, we only find an advantage for the self-associated shape when participants make perceptual (shape) judgements. It is therefore highly unlikely that the “association strength” can account for the difference in the outcomes between the conditions in experiment 1. Also, note that the order in which these conditions were presented was counter-balanced across participants, reducing the possibility that the automatic self-advantage was merely a result of learning or fatigue. Secondly, all participants completed the standard matching task to ascertain that the association between shapes and identities did indeed lead to processing advantages (across different levels).

      In summary, we believe that the evidence we provide supports the final conclusions. We do, of course, welcome any further empirical evidence that could enhance our understanding of the contribution of different processing levels to the SPE and are committed to exploring these areas in future work.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      The reviewer indicates that the current findings heavily rely on the perceptual matching task, and it would be more convincing to include other paradigm(s) and different types of stimuli. We are happy to address these points here: first, we specifically used a temporal order paradigm to tap into specific processes, rather than merely relying on the matching task. Attentional selection is, along with other processes, involved in matching, but the TOJ-TVA approach allows tapping into attentional selection specifically.  Second, self-prioritization effects have been replicated across a wide range of stimuli (e.g. faces: Wozniak et al., 2018; names or owned objects: Scheller & Sui, 2022a, or even fully unfamiliar stimuli: Wozniak & Knoblich, 2019) and paradigms (e.g. matching task: Sui et al., 2012; cross-modal cue integration: e.g. Scheller & Sui, 2022b; Scheller et al., 2023; continuous flash suppression: Macrae et al., 2017; temporal order judgment: Constable et al., 2019; Truong et al., 2017). Using neutral geometric shapes, rather than faces and names, addresses a key challenge in self research: mitigating the influence of stimulus familiarity on results. In addition, these newly learned, simple stimuli can be combined with other paradigms, such as the TOJ paradigm in the current study, to investigate the broader impact of self-processing on perception and cognition.

      To the best of our knowledge, this is the first study showing evidence about the mechanisms that are involved in early attentional selection of socially salient stimuli. Future replications and extensions would certainly be useful, as with any experimental paradigm.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

      Equating the levels of social and perceptual salience is indeed challenging, but not an aim of the present study. Instead, the present study directly compares the mechanisms and effects of social and perceptual salience, specifically experiment 2. By manipulating perceptual salience (relative colour) and social salience (relative shape association) independently and jointly, and quantifying the effects on processing rates, our study allows to directly delineate the contributions of each of these types of salience. The results suggest additive effects (see also Figure 7). Indeed, the possibility remains that these effects are additive because of the use of different perceptual features, so it would be helpful for future studies to explore whether similar perceptual features lead to (supra-/sub-) additive effects. In either case, the study design allows to directly compare the effects and mechanisms of social and perceptual salience.

      Regarding the social and perceptual decision dimensions, they were not expected to be equated. Indeed, the social decision dimension requires additional retrieval of the associated identity, making it likely more challenging. This additional retrieval is also likely responsible for the slower responses towards the social association compared to the shape itself. However, the motivation to compare the effects of these two decisional dimensions lies in the assumption that the self needs to be task relevant. Some evidence suggests that the self needs to be task-relevant to induce self-prioritization effects (e.g., Woźniak & Knoblich, 2022). However, these studies typically used matching tasks and were powered to detect large effects only (e.g. f = 0.4, n = 18). As it is likely that lacking contribution of decisional processing levels (which interact with task-relevance) will reduce the SPE, smaller self-prioritization effects that result from earlier processing levels may not be detected with sufficient statistical power. Targeting specific processing levels, especially those with relatively early contributions or small effect sizes, requires larger samples (here: n = 70) to provide sufficient power. Indeed, by contrasting the relative attentional selection effects in the present study we find that the self does not need to be task-relevant to produce self-prioritization effects. This is in line with recent findings of prior entry of self-faces (Jubile & Kumar, 2021)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors show that SVZ-derived astrocytes respond to a middle carotid artery occlusion (MCAO) hypoxia lesion by secreting and modulating hyaluronan at the edge of the lesion (penumbra) and that hyaluronan is a chemoattractant to SVZ astrocytes. They use lineage tracing of SVZ cells to determine their origin. They also find that SVZ-derived astrocytes express Thbs-4 but astrocytes at the MCAO-induced scar do not. Also, they demonstrate that decreased HA in the SVZ is correlated with gliogenesis. While much of the paper is descriptive/correlative they do overexpress Hyaluronan synthase 2 via viral vectors and show this is sufficient to recruit astrocytes to the injury. Interestingly, astrocytes preferred to migrate to the MCAO than to the region of overexpressed HAS2.

      Strengths:

      The field has largely ignored the gliogenic response of the SVZ, especially with regard to astrocytic function. These cells and especially newborn cells may provide support for regeneration. Emigrated cells from the SVZ have been shown to be neuroprotective via creating pro-survival environments, but their expression and deposition of beneficial extracellular matrix molecules are poorly understood. Therefore, this study is timely and important. The paper is very well written and the flow of results is logical.

      Weaknesses:

      The main problem is that they do not show that Hyaluronan is necessary for SVZ astrogenesis and or migration to MCAO lesions. Such loss of function studies have been carried out by studies they cite (e.g. Girard et al., 2014 and Benner et al., 2013). Similar approaches seem to be necessary in this work. 

      We appreciate the comments by the reviewer. The article is, indeed, largely descriptive since we attempt to describe in detail what happens to newborn astrocytes after MCAO. Still, we have not attempted any modification to the model, such as amelioration of ischemic damage. This is a limitation of the study that we do not hide. However, we use several experimental approaches, such as lineage tracing and hyaluronan modification, to strengthen our conclusions.

      Regarding the weaknesses found by the reviewer, we do not claim that hyaluronan is necessary for SVZ astrogenesis. Indeed, we observe that when the MCAO stimulus (i.e. inflammation) is present, the HMW-HA (AAV-Has2) stimulus is less powerful (we discuss this in line 330-332). We do claim, and we believe we successfully demonstrate, the reverse situation: that SVZ astrocytes modulate hyaluronan, not at the SVZ but at the site of MCAO, i.e. the scar. However, regarding whether hyaluronan is necessary for SVZ astrogenesis, we only show a correlation between its degradation and the time-course of astrogenesis. We suggest this result as a starting point for a follow-up study. We have included a phrase in the discussion (line 310), stating that further experiments are needed to fully establish a link between hyaluronan and astrogenesis in the SVZ.

      Major points:

      (1) How good of a marker for newborn astrocytes is Thbs4? Did you co-label with B cell markers like EGFr? Is the Thbs4 gene expressed in B cells? Do scRNAseq papers show it is expressed in B cells? Are they B1 or B2 cells?

      We chose Thbs4 as a marker of newborn astrocytes based on published research (Beckervordersanforth et al., 2010; Benner et al., 2013; Llorens-Bobadilla et al. 2015, Codega et al, 2014; Basak et al., 2018; Mizrak et al., 2019; Kjell et al., 2020; Cebrian-Silla et al., 2021). From those studies, at least 3 associate Thbs4 to B-type cells based on scRNAseq data (LlorensBobadilla et al. 2015; Cebrian-Silla et al., 2021; Basak et al., 2018). We have included a sentence about this and the associated references, in line 92. 

      We co-label Thbs4 with EGFR, but in the context of MCAO. We observed an increase of EGFR expression with MCAO, similar to the increase in Thbs4 alongside ischemia (see author ). We did not include this figure in the manuscript since we did not have available tissue from all the time points we used (7d, 60d post-ischemia). 

      Author response image 1.

      Thbs4 cells, in basal and ischemic conditions, only represent a small amount of IdU-positive cells (Fig 3F), suggesting that they are mostly quiescent cells, i.e., B1 cells. However, the scRNAseq literature is not consistent about this.

      (2) It is curious that there was no increase in Type C cells after MCAO - do the authors propose a direct NSC-astrocyte differentiation?

      Type C cells are fast-proliferating cells, and our BrdU/IdU experiment (Fig. 3) suggests that Thbs4 cells are slow-proliferating cells. Some authors suggest (Encinas lab, Spain) that when the hippocampus is challenged by a harsh stimulus, such as kainate-induced epilepsy, the NSCs differentiate directly into reactive astrocytes and deplete the DG neurogenic niche (Encinas et al., 2011, Cell Stem Cell; Sierra et al., 2015, Cell Stem Cell). We believe this might be the case in our MCAO model and the SVZ niche, since we observe a decrease in DCX labeling in the olfactory bulb (Fig S5) and an increase in astrocytes in the SVZ, which migrate to the ischemic lesion. We did not want to overcomplicate an already complicated paper, dwelling with direct NSC-astrocyte differentiation or with the reactive status of these newborn astrocytes. 

      (3) The paper would be strengthened with orthogonal views of z projections to show colocalization.

      We thank the reviewer for this observation. We have now included orthogonal projections in the critical colocalization IF of CD44 and hyaluronan (hyaluronan internalization) in Fig S6D, and a zoomed-in inset. Hyaluronan membrane synthesis is already depicted with orthogonal projection in Fig 6F.

      (4) It is not clear why the dorsal SVZ is analysed and focused on in Figure 4. This region emanates from the developmental pallium (cerebral cortex anlagen). It generates some excitatory neurons early postnatally and is thought to have differential signalling such as Wnt (Raineteau group).

      We decided to analyze in depth the dorsal SVZ after the BrdU experiment (Fig S3), where we observed an increase in BrdU+/Thbs4+ cells mostly in the dorsal area. Hence, the electrodes for electroporation were oriented in such a way as to label the dorsal area. We appreciate the paper by Raineteau lab, but we assume that this region may potentially exploit other roles (apart from excitatory neurons generated early postnatally) depending on the developmental stage (our model is in adults) and/or pathological conditions (MCAO). 

      (5) Several of the images show the lesion and penumbra as being quite close to the SVZ. Did any of the lesions contact the SVZ? If so, I would strongly recommend excluding them from the analysis as such contact is known to hyperactivate the SVZ.

      We thank the referee for the suggestion to exclude the harsher MCAO-lesioned animals from the analysis. Indeed, the MCAO ischemia, methodologically, can generate different tissue damages that cannot be easily controlled. Thus, based on TTC staining, we had already excluded the more severe tissue damage that contacted the SVZ, based on TTC staining.

      (6) The authors switch to a rat in vitro analysis towards the end of the study. This needs to be better justified. How similar are the molecules involved between mouse and rat?

      We chose the rat culture since it is a culture that we have already established in our lab, and that in our own hands, is much more reproducible than the mouse brain cell culture that we occasionally use (for transgenic animals only). Benito-Muñoz et al., Glia. 2016; Cavaliere et al., Front Cell Neurosci. 2013. It is true that there could be differences between the rat and mouse Thbs4-cell physiology, despite a 96% identity between rat and mouse Thbs4 protein sequence (BLASTp). In vitro, we only confirm the capacity of astrocytes to internalize hyaluronan, which was a finding that we did not expect in our in vivo experiments. Indeed, these observations, notwithstanding the obvious differences between in vivo and in vitro scenarios, suggest that the HA internalization by astrocytes is a cross-species event, at least in rodents. Regarding HA, hyaluronan is similar in all species, since it’s a glycan (this is why there are no antibodies against HA, and ones has to rely on binding proteins such as HABP to label it).

      (7) Similar comment for overexpression of naked mole rat HA.

      We chose the naked mole rat Hyaluronan synthase (HAS), because it is a HAS that produces HA of very high molecular weight, similar to the one found accumulated in the glial scar, at the lesion border. The naked-mole rat HAS used in mice (Gorbunova Lab) is a known tool in the ECM field. (Zhang et al, 2023, Nature; Tian et al., 2013, Nature).

      Reviewer 1 (Recommendation to authors):

      (1) Line 22: most of the cells that migrate out of the SVZ are not stem cells but cells further along in the lineage - neuroblasts and glioblasts.

      We thank the reviewer for this clarification. We have modified the abstract accordingly. 

      (2) In Figure 3d the MCAO group staining with GFAP looks suspiciously like ependymal cells which have been shown to be dramatically activated by stroke models.

      The picture does show ependymal cells, which are located next to the ventricle and are indeed very proliferative in stroke. However, these cells do not express Thbs4 (Shah et al., 2018, Cell). In the quantifications from the SVZ of BrdU and IdU injected animals (Fig 3e and f), we only take into account Thbs4+ GFAP+ cells, no GFAP+ only. 

      (3) The TTC injury shown in Figure 5c is too low mag.

      We apologize for the low mag. We have increased the magnification two-fold without compromising resolution. The problem might also have arisen from the compression of TIF into JPEG in the PDF export process. We will address this in the revised version by carefully selecting export settings. The images we used are all publication quality (300 ppi).

      (4) How specific to HA is HABP?

      Hyaluronic Acid Binding Protein is a canonical marker for hyaluronan that is used also in ELISA to quantify it specifically, since it does not bind other glycosaminoglycans. The label has been used for years in the field for immunochemistry, and some controls and validations have been published: Deepa et al., 2006, JBC performed appropriate controls of HABP-biotin labeling using hyaluronidase (destroys labeling) and chondroitinase (preserves labeling). Soria et al., 2020, Nat Commun checked that (i) streptavidin does not label unspecifically, and (ii) that HABP staining is reduced after hyaluronan depletion in vivo with HAS inhibitor 4MU.

      (5) A number of images are out of focus and thus difficult to interpret (e.g. SFig. 4e).

      This is true. We realized that the PDF conversion process for the preprint version has severely compressed the larger images, such as the one found in Fig. S4e. We have submitted a revised version in a better-quality PDF (the final paper will have the original TIFF files). We apologize for the technical problem.

      (6) "restructuration" is not a word.

      We apologize for the mistake and thank the reviewer for the correction. We corrected “restructuration” with “reorganization” in line 67.

      (7) While much of the manuscript is well-written and logical it could use an in-depth edit to remove awkward words and phrasings.

      A native English speaker has revised the manuscript to correct these awkward phrases. All changes are labeled in red in the revised version.

      (8) Please describe why and how you used skeleton analysis for HABP in the methods, this will be unfamiliar to most readers. The one-sentence description in the methods is insufficient.

      We have modified the text accordingly, explaining in depth the logic behind the skeleton analysis. (Line 204). We also added several lines of text describing in detail the image analysis (CD44/HABP spots, fractal dimension, masks for membranal HABP, among others, in lines 484494) 

      Reviewer #2 (Public Review)

      Summary:

      In their manuscript, Ardaya et al have addressed the impact of ischemia-induced gliogenesis from the adult SVZ and their effect on the remodeling of the extracellular matrix (ECM) in the glial scar. They use Thbs4, a marker previously identified to be expressed in astrocytes of the SVZ, to understand its role in ischemia-induced gliogenesis. First, the authors show that Thbs4 is expressed in the SVZ and that its expression levels increase upon ischemia. Next, they claim that ischemia induces the generation of newborn astrocyte from SVZ neural stem cells (NSCs), which migrate toward the ischemic regions to accumulate at the glial scar. Thbs4-expressing astrocytes are recruited to the lesion by Hyaluronan where they modulate ECM homeostasis.

      Strengths:

      The findings of these studies are in principle interesting and the experiments are in principle good.

      Weaknesses:

      The manuscript suffers from an evident lack of clarity and precision in regard to their findings and their interpretation.

      We thank the reviewer for the valuable feedback. We hope the changes proposed improve clarity and precision throughout the manuscript.

      (1) The authors talk about Thbs4 expression in NSCs and astrocytes, but neither of both is shown in Figure 1, nor have they used cell type-specific markers.

      As we reported also to Referee #1 (major point 1), Thbs4 is widely considered in literature as a valid marker for newly formed astrocytes (Beckervordersanforth et al., 2010; Benner et al., 2013; Llorens-Bobadilla et al. 2015, Codega et al, 2014; Basak et al., 2018; Mizrak et al., 2019; Kjell et al., 2020; Cebrian-Silla et al., 2021). Some of the studies mentioned here and discussed in the manuscript text, also associate Thbs4 to B-type cells based on scRNAseq data (LlorensBobadilla et al. 2015; Cebrian-Silla et al., 2021; Basak et al., 2018). Moreover, we also showed colocalization of Thbs4 with activated stem cells marker nestin (Fig.2), glial marker GFAP (Fig. 3) and with dorsal NSCs marker tdTOM (from electroporation, Fig. 4). 

      (2) Very important for all following experiments is to show that Thbs4 is not expressed outside of the SVZ, specifically in the areas where the lesion will take place. If Thbs4 was expressed there, the conclusion that Thbs4+ cells come from the SVZ to migrate to the lesion would be entirely wrong.

      In Figure 1a, we show that Thbs4 is expressed in the telencephalon, exclusively in the neurogenic regions like SVZ, RMS and OB, together with cerebellum and VTA, which are likely not directly topographically connected to the damaged area (cortex and striatum). Regarding the origin of Thbs4+ cells, we demonstrated their SVZ origin by lineage tracking experiments after in vivo cell labeling (Fig. 4).

      (3) Next, the authors want to confirm the expression level of Thbs4 by electroporation of pThbs4-eGFP at P1 and write that this results in 20% of total cells expressing GFP, especially in the rostral SVZ. I do not understand the benefit of this sentence. This may be a confirmation of expression, but it also shows that the GFP+ cells derive from early postnatal NSCs.

      Furthermore, these cells look all like astrocytes, so the authors could have made a point here that indeed early postnatal NSCs expressing Thbs4 generate astrocytes alongside development. Here, it would have been interesting to see how many of the GFP+ cells are still NSCs.

      We thank the reviewer for this useful remark. We have rephrased this paragraph in the results section (Line 99).

      (4) In the next chapter, the authors show that Thbs4 increases in expression after brain injury. I do not understand the meaning of the graphs showing expression levels of distinct cell types of the neuronal lineage. Please specify why this is interesting and what to conclude from that.

      Also here, the expression of Thbs4 should be shown outside of the SVZ as well.

      In Fig 2, we show the temporal expression of two markers (besides Thbs4) in the SVZ. Nestin and DCX are the gold standard markers for NSCs, with DCX present in neuroblasts. This is already explained in line 119. What we didn’t explain, and now we say in line 124, is that Nestin and DCX decrease immediately after ischemia (7d time-point). This probably means that the NSCs stop differentiating into neuroblast to favor glioblast formation. This is also supported by the experiments in the olfactory bulb depicted in Fig. S5C-H.

      (5) Next, the origin of newborn astrocytes from the SVZ upon ischemia is revealed. The graphs indicate that the authors perfused at different time points after tMCAO. Did they also show the data of the early time points? If only of the 30dpi, they should remove the additional time points indicated in the graph. In line 127 they talk about the origin of newborn astrocytes. Until now they have not even mentioned that new astrocytes are generated. Furthermore, the following sentences are imprecise: first they write that the number of slow proliferation NSCs is increased, then they talk about astrocytes. How exactly did they identify astrocytes and separate them from NSCs? Morphologically? Because both cell types express GFAP and Thbs4.

      The same problem also occurs throughout the next chapter.

      We thank the reviewer for this interesting comment. The experiment in Fig 3 combines BrdU and IdU. This is a tricky experiment, since chronic BrdU is normally analyzed after 30d, since the experimenter must wait for the wash out of BrdU (it labels slow-proliferating cells). Since we also wanted to label fast proliferative cells with IdU, we used IP injections of this nucleotide at the different time points, and perfused the day after. It wouldn’t make sense to show BrdU at earlier time points. We do so in Fig 3e, just to colocalize with Thbs4 to read the tendency of the experiment. However, the quantification of BrdU (not of IdU) is done only at 30 DPI, which is explained in the methods (line 407).

      “In line 127, they talk about the origin of newborn astrocytes…” 

      Indeed, we wanted to introduce in the paragraph title that ischemia induced the generation of new astrocytes, which is more clearly described in the text. We changed the paragraph title with “Characterization of Ischemia-induced cell populations”

      “How exactly did they identify astrocytes and separate them from NSC?” 

      With this experiment and using two different protocols to label proliferating cells (BrdU vs IdU) we wanted to track the precursor cells that derivate to astrocytes and that already expressed the marker Thbs4. Indeed, the different increase and rate of proliferation is only related to the progenitor cells that lately will differentiate in astrocytes. In this experiment we only referred to the astrocytes in the last sentence “These results suggest that, after ischemia, Thbs4positive astrocytes derive from the slow proliferative type B cells”

      (6) "These results suggest that ischemia-induced astrogliogenesis in the SVZ occurs in type B cells from the dorsal region, and that these newborn Thbs4-positive astrocytes migrate to the ischemic areas." This sentence is a bit dangerous and bares at least one conceptual difficulty: if NSCs generate astrocytes under normal conditions and along the cause of postnatal development (which they do), then local astrocytes  (expressing the tdTom because they stem from a postnatal NSC ), may also react to MCAO and proliferate locally. So the astrocytes along the scar do not necessarily come from adult NSCs upon injury but from local astrocytes.  If the authors state that NSCs generate astrocytes that migrate to the lesion, I would like to see that no astrocytes inside the striatum carry the tdTom reporter before MCAO is committed.

      We understand the referee’s concern about the postnatal origin of astrocytes that can also be labeled with tdTom. Our hypothesis, tested at the beginning of the paper, is that SVZ-derived astrocytes derive from slow proliferative NSC. Thus, it is reasonable that Tom+ cells can reach the cortical region in such a short time frame. This is why we assumed that local astrocytes can’t be positive for tdTom. We characterized the expression of tfTom in sham animals and we observed few tdTom+ cells in the cortex and striatum (Author response image 2 and Figure S4). The expression of tdTom mainly remains in the SVZ and the corpus callosum under physiological conditions. However, proliferation of local astrocytes labeled with tdTom expression (early postnatally astrocytes) could explain the small percentage of tdTom+ cells in the ischemic regions that do not express Thbs4, even though this percentage could represent other cell types such as OPCs or oligodendrocytes. 

      Author response image 2.

      (7) If astrocytes outside the SVZ do not express Thbs4, I would like to see it.  Otherwise, the discrimination of SVZ-derive GFAP+/Thbs4+ astrocytes and local astrocytes expressing only GFAP is shaky.

      Regarding Thbs4 outside the SVZ, we already answered this in point 2 (please refer to Fig 1A). We also quantified the expression of Thbs4+/GFAP+ astrocytes in the corpus callosum, cortex and striatum of sham and MCAO mice (Figure S5a-b) and we did not observe that local astrocytes express Thbs4 under physiological conditions.

      (8) Please briefly explain what a Skeleton analysis and a Fractal dimension analysis is, and what it is good for.

      We apologized for the brief information on Skeleton and Fractal dimension analysis. We included a detailed explanation of these analyses in methods (line 484-494).

      (9) The chapter on HA is again a bit difficult to follow. Please rewrite to clarify who produces HA and who removes it by again showing all astrocyte subtypes (GFAP+/Thbs4+ and GFAP+/Thbs4-).

      We apologize for the lack of clarity. We rewrote some passages of those chapters (changes in red), trying to convey the ideas more clearly. We also changed a panel in Figure S6b-c to clarify all astrocytes subtypes that internalize hyaluronan (Thbs4+/GFAP+ and Thbs4-/GFAP+). See Author response image 3.

      Author response image 3.

      (10) Why did the authors separate dorsal, medial, and ventral SVZ so carefully? Do they comment on it? As far as I remember, astrogenesis in physiological conditions has some local preferences (dorsal?)

      We performed the electroporation protocol in the dorsal SVZ based on previous results (Figure 3 and Figure S3). NSC produce specific neurons in the olfactory bulb according to their location in the SVZ. However, postnatal production of astrocytes mainly occurs through local astrocytes proliferation and the SVZ contribution is very limited at this time point. 

      Reviewer #3 (Public Review)

      Summary:

      The authors aimed to study the activation of gliogenesis and the role of newborn astrocytes in a post-ischemic scenario. Combining immunofluorescence, BrdU-tracing, and genetic cellular labelling, they tracked the migration of newborn astrocytes (expressing Thbs4) and found that Thbs4-positive astrocytes modulate the extracellular matrix at the lesion border by synthesis but also degradation of hyaluronan. Their results point to a relevant function of SVZ newborn astrocytes in the modulation of the glial scar after brain ischemia. This work's major strength is the fact that it is tackling the function of SVZ newborn astrocytes, whose role is undisclosed so far.

      Strengths:

      The article is innovative, of good quality, and clearly written, with properly described Materials and Methods, data analysis, and presentation. In general, the methods are designed properly to answer the main question of the authors, being a major strength. Interpretation of the data is also in general well done, with results supporting the main conclusions of this article.

      Weaknesses:

      However, there are some points of this article that still need clarification to further improve this work.

      (1) As a first general comment, is it possible that the increase in Thbs4-positive astrocytes can also happen locally close to the glia scar, through the proliferation of local astrocytes or even from local astrocytes at the SVZ? As it was shown in published articles most of the newborn astrocytes in the adult brain actually derive from proliferating astrocytes, and a smaller percentage is derived from NSCs. How can the authors rule out a contribution of local astrocytes to the increase of Thbs4-positive astrocytes? The authors also observed that only about one-third of the astrocytes in the glial scar derived from the SVZ.

      We thank the reviewer for the interesting comment. We have extended the discussion about this topic in the manuscript, (lines 333-342), including the statement about a third of glial scar astrocytes being from the SVZ and not downplaying the role of local astrocytes.  Whether the glial scar is populated by newborn astrocytes derived from SVZ or from local astrocytes is under debate, since there are groups that found astrocytes contribution from local astrocytes (Frisèn group, Magnusson et al., 2014) but there are others that observed the opposite (Li et al., 2010; Benner et al., 2013; Faiz et al., 2015; Laug et al., 2019 & Pous et al., 2020). 

      In our study we observed that Thbs4 expression is almost absent in the cortex and striatum of sham mice. To demonstrate that new-born astrocytes are derived from SVZ we used two techniques: the chronic BrdU treatment and the cell tracing which mainly labels SVZ neural stem cells. Fast proliferating cells lose BrdU quickly so local astrocytes under ischemic conditions do not express BrdU. In addition, we injected IdU the day before perfusion in order to see if local astrocytes express Thbs4 when they respond to the brain ischemia. However, we did not observe proliferating local astrocytes expressing Thbs4 after MCAO (see Author response image 4)

      Author response image 4.

      As mentioned in the response for reviewer 2, the cell tracing technique could label early postnatal astrocytes. We characterized the technique and only a small percentage of tdTom expression was found in the cortex and striatum of sham animals.  This tdTom population could explain the percentage of tdTom+ cells in the ischemic regions that do not express Thbs4 even though this percentage could represent other cell types such as OPCs or oligodendrocytes. Taking all together, evidences suggest that Thbs4+ astrocyte population derived from the SVZ. 

      We indeed observed a small contribution of Thbs4+ astrocytes to the glial scar. However, Thbs4+ astrocytes arrive at the lesion at a critical temporal window - when local hyper-reactive astrocytes die or lose their function. We hypothesized that Thbs4+ astrocytes could help local astrocytes or replace them in reorganizing the extracellular space and the glial scar, an instrumental process for the recovery of the ischemic area. 

      (2) It is known that the local, GFAP-reactive astrocytes at the scar can form the required ECM. The authors propose a role of Thbs4-positive astrocytes in the modulation, and perhaps maintenance, of the ECM at the scar, thus participating in scar formation likewise. So, this means that the function of newborn astrocytes is only to help the local astrocytes in the scar formation and thus contribute to tissue regeneration. Why do we need specifically the Thbs4positive astrocytes migrating from the SVZ to help the local astrocytes? Can you discuss this further?

      Unfortunately, we could not demonstrate which molecular machinery is involved in these mechanisms, and we can only speculate the functional meaning of a second wave of glial activation. We added a lengthy discussion in lines 333-342.

      (3) The authors observed that the number of BrdU- and DCX-positive cells decreased 15 dpi in all OB layers (Fig. S5). They further suggest that ischemia-induced a change in the neuroblasts ectopic migratory pathway, depriving the OB layers of the SVZ newborn neurons. Are the authors suggesting that these BrdU/DCX-positive cells now migrate also to the ischemic scar, or do they die? In fact, they see an increase in caspase-3 positive cells in the SVZ after ischemia, but they do not analyse which type of cells are dying. Alternatively, is there a change in the fate of the cells, and astrogliogenesis is increased at the expense of neurogenesis?  The authors should understand which cells are Cleaved-caspase-3 positive at the SVZ and clarify if there is a change in cell fate. Also please clarify what happens to the BrdU/DCX-positive cells that are born at the SVZ but do not migrate properly to the OB layers.

      Actually, we cannot demonstrate the fate of missing BrdU/DCX cells in the OB. We can reasonably speculate that following the ischemic insult, the neurogenic machinery steers toward investing more energy in generating glial cells to support the lesion. We didn’t analyze the fate of the DCX that originally should migrate and differentiate to the OB, whether they die or if there is a shift in the differentiation program in the SVZ, since we consider that question is out of the study’s scope.   

      (4) The authors showed decreased Nestin protein levels at 15 dpi by western blot and immunostaining shows a decrease already at 7div (Figure 2). These results mean that there is at least a transient depletion of NSCs due to the promotion of astrogliogenesis. However, the authors show that at 30dpi there is an increase of slow proliferating NSCs (Figure 3). Does this mean, that there is a reestablishment of the SVZ cytogenic process?  How does it happen, more specifically, how NSCs number is promoted at 30dpi?  Please explain how are the NSCs modulated throughout time after ischemia induction and its impact on the cytogenic process.

      Based on the chronic BrdU treatment, results suggested a restoration of SVZ cytogenic process (also observed in the nestin and DCX proteins expression at 30dpi). However, we did not analyze how it happens (from asymmetric or symmetric divisions). As suggested by Encinas group, we hypothesized that the brain ischemia induces the exhaustion of the neurogenic niche of the SVZ by symmetric divisions of NSC into reactive astrocytes.

      (5) The authors performed a classification of Thbs4-positive cells in the SVZ according to their morphology. This should be confirmed with markers expressed by each of the cell subtypes.

      We thank the referee for the comment. Classifying NSC based on different markers could also be tricky because different NSC cell types share markers. This classification was made considering the specific morphology of each NSC cell type. In addition, Thbs4 expression in Btype cells is also observed in other studies (Llorens-Bobadilla et al. 2015; Cebrian-Silla et al., 2021; Basak et al., 2018).

      (6) In Figure S6, the authors quantified HABP spots inside Thbs4-positive astrocytes. Please show a higher magnification picture to show how this quantification was done.

      We quantified HABP area and HABP spots inside Thbs4+ astrocytes with a custom FIJI script.

      Thbs4 cell mask was done via automatic thresholding within the GFAP cell mask. Threshold for HABP marker was performed and binary image was processed with 1 pixel median filter (to eliminate 1 px noise-related spots). “Analyze particles” tool was used to sort HABP spots in the cell ROI. HABP spot number per compartment and population was exported to excel and data was normalized dividing HABP spots per ROI by total HABP spots. See Author response image 5.

      Author response image 5.

    2. eLife Assessment

      This work shows that newborn Thbs4-positive astrocytes generated in the adult subventricular zone (SVZ) respond to middle carotid artery occlusion (MCAO) by secreting hyaluronan at the lesion penumbra, and that hyaluronin is a chemoattractant to SVZ astrocytes. These findings are important, despite mostly descriptive, as they point to a relevant function of SVZ newborn astrocytes in the modulation of the glial scar after brain ischemia. The methods, data and analyses are convincing and broadly support the claims made by the authors with only some weaknesses.

    3. Reviewer #1 (Public review):

      Summary:

      The authiors show that SVZ derived astrocytes respond to a middle carotid artery occlusion (MCAO) hypoxia lesion by secreting and modulating hyaluronan at the edge of the lesion (penumbra) and that hyaluronin is a chemoattractant to SVZ astrocytes. They use lineage tracing of SVZ cells to determine their origin. They also find that SVZ derived astrocytes express Thbs-4 but astrocytes at the MCAO-induced scar do not. Also, they demonstrate that decreased HA in the SVZ is correlated with gliogenesis. While much of the paper is descriptive/correlative they do overexpress Hyaluronan synthase 2 via viral vectors and show this is sufficient to recruit astrocytes to the injury. Interestingly, astrocytes preferred to migrate to the MCAO than to the region of overexpressed HAS2.

      Strengths:

      The field has largely ignored the gliogenic response of the SVZ, especially with regards to astrocytic function. These cells and especially newborn cells may provide support for regeneration. Emigrated cells from the SVZ have been shown to be neuroprotective via creating pro-survival environments, but their expression and deposition of beneficial extracellular matrix molecules is poorly understood. Therefore, this study is timely and important. The paper is very well written and the flow of results logical.

      Comments on revised version:

      The authors have addressed my points and the paper is much improved. Here are the salient remaining issues that I suggest be addressed.

      The authors have still not shown, using loss of function studies, that Hyaluronan is necessary for SVZ astrogenesis and or migration to MCAO lesions.

      (1) The co-expression of EGFr with Thbs4 and the literature examination is useful.

      (2) Too bad they cannot explain the lack of effect of the MCAO on type C cells. The comparison with kainate-induced epilepsy in the hippocampus may or may not be relevant.

      (3) Thanks for including the orthogonal confocal views in Fig S6D.

      (4) The statement that "BrdU+/Thbs4+ cells mostly in the dorsal area" and therefore they mostly focused on that region is strange. Figure 8 clearly shows Thbs4 staining all along the striatal SVZ. Do they mean the dorsal segment of the striatal SVZ or the subcallosal SVZ? Fig. 4b and Fig 4f clearly show the "subcallosal" area as the one analysed but other figures show the dorsal striatal region (Fig. 2a). This is important because of the well-known embryological and neurogenic differences between the regions.

      (5) It is good to know that the harsh MCAO's had already been excluded.

      (6) Sorry for the lack of clarity - in addition to Thbs4, I was referring to mouse versus rat Hyaluronan degradation genes (Hyal1, Hyal2 and Hyal3) and hyaluronan synthase genes (HAS1 and HAS2) in order to address the overall species differences in hyaluronan biology thus justifying the "shift" from mouse to rat. You examine these in the (weirdly positioned) Fig. 8h,i. Please add a few sentences on mouse vs rat Thbs4 and Hyaluronan relevant genes.

      (7) Thank you for the better justification of using the naked mole rat HA synthase.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to study the activation of gliogenesis and the role of newborn astrocytes in a post-ischemic scenario. Combining immunofluorescence, BrdU-tracing and genetic cellular labelling, they tracked the migration of newborn astrocytes (expressing Thbs4) and found that Thbs4-positive astrocytes modulate the extracellular matrix at the lesion border by synthesis but also degradation of hyaluronan. Their results point to a relevant function of SVZ newborn astrocytes in the modulation of the glial scar after brain ischemia. This work's major strength is the fact that it is tackling the function of SVZ newborn astrocytes, whose role is undisclosed so far.

      Strengths:

      The article is innovative, of good quality, and clearly written, with properly described Materials and Methods, data analysis and presentation. In general, the methods are designed properly to answer the main question of the authors, being a major strength. Interpretation of the data is also in general well done, with results supporting the main conclusions of this article.

      In this revised version, the points raised/weaknesses were clarified and discussed in the article.

    1. eLife Assessment

      This study describes a useful technique to improve imaging depth using confocal microscopy for imaging large, cleared samples. It is as yet unclear if their proposed technique presents a significant advance to the field since their comparisons to existing techniques remain incomplete. However, the work will be of broad interest to many researchers in different fields.

    2. Reviewer #1 (Public review):

      Summary:

      Liu et al., present an immersion objective adapter design called RIM-Deep, which can be utilized for enhancing axial resolution and reducing spherical aberrations during inverted confocal microscopy of thick cleared tissue.

      Strengths:

      RI mismatches present a significant challenge to deep tissue imaging, and developing a robust immersion method is valuable in preventing losses in resolution. Liu et al., present data showing that RIM-Deep is suitable for tissue cleared with two different clearing techniques, demonstrating the adaptability and versatility of the approach.

      Weaknesses:

      Liu et al., claim to have developed a useful technique for deep tissue imaging, but in its current form, the paper does not provide sufficient evidence that their technique performs better than existing ones.

    3. Reviewer #2 (Public review):

      Summary:

      Liu et al investigated the performance of a novel imaging technique called RIM-Deep to enhance the imaging depth for cleared samples. Usually, the imaging depth using the classical confocal microscopy sample chamber is limited due to optical aberrations, resulting in loss of resolution and image quality. To overcome this limitation and increase depth, they generated a special imaging chamber, that is affixed to the objective and filled with a solution matching the refractive indices to reduce aberrations. Importantly, the study was conducted using a standard confocal microscope, that has not been modified apart from exchanging the standard sample chamber with the RIM-Deep sample holder. Upon analysing the imaging depth, the authors claim that the RIM-Deep method increased the depth from 2 mm to 5 mm. In summary, RIM-Deep has the potential to significantly enhance imaging quality of thick samples on a low budget, making in-depth measurements possible for a wide range of researchers that have access to an inverted confocal microscope.

      Strengths:

      The authors used different clearing methods to demonstrate the suitability of RIM-Deep for various sample preparation protocols with clearing solutions of different refractive indices. They clearly demonstrate that the RIM-Deep chamber is compatible with all 3 methods. Brain samples are characterized by complex networks of cells and are often hard to visualize. Despite the dense, complex structure of brain tissue, the RIM-Deep method generated high quality images of all 3 samples given. As the authors already stated, increasing imaging depth often goes hand in hand with purchasing expensive new equipment, exchanging several microscopy parts or purchasing a new microscopy set-up. Innovations, such as the RIM-Deep chamber, hence, might pave the way for cost-effective imaging and expand the applicability of an inverted confocal microscope.

      Weaknesses:

      (1) However, since this study introduces a novel imaging technique, and therefore, aims to revolutionize the way of imaging large samples, additional control experiments would strengthen the data. From the 3 clearing protocol used (CUBIC, MACS and iDISCO), only the brain section from Macaca fascicularis cleared with iDISCO was imaged with the standard chamber and the RIM-Deep method. This comparison indeed shows that the imaging depth thereby increases more than 2-fold, which is a significant enhancement in terms of microscopy. However, it would have been important to evaluate and show the difference of the imaging depth also on the other two samples, since they were cleared with different protocols and, thus, treated with clearing solutions of different refractive indices compared to iDCISCO.

      (2) The description of the figures and figure panels should be improved for a better understanding of the experiments performed and the thus resulting images/data.

      (3) While the authors used a Nikon AX inverted laser scanning confocal microscope, the study would highly benefit from evaluating the performance of the RIM-Deep method using other inverted confocal microscopes or even wide-field microscopes.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present an immersion objective adapter design called RIM-Deep, which can be utilized for enhancing axial resolution and reducing spherical aberrations during inverted confocal microscopy of thick cleared tissue.

      Strengths:

      RI mismatches present a significant challenge to deep tissue imaging, and developing a robust immersion method is valuable in preventing losses in resolution. Liu et al., present data showing that RIM-Deep is suitable for tissue cleared with two different clearing techniques, demonstrating the adaptability and versatility of the approach.

      Greetings, we greatly appreciate your feedback. In truth, we have utilized three distinct clearing techniques, including iDISCO, CUBIC, and MACS, to substantiate the adaptability and multifunctionality of the RIM-Deep adapter.

      Weaknesses:

      Liu et al., claim to have developed a useful technique for deep tissue imaging, but in its current form, the paper does not provide sufficient evidence that their technique performs better than existing ones.

      We are in complete agreement with your recommendation, and the additional experiments will conduct a thorough comparison of the efficacy between the RIM-deep adapter and the official adapter in the context of fluorescence bead experiments, along with their performance in cubic and MASC tissue clearing techniques.

      Reviewer 2 (Public review):

      The authors used different clearing methods to demonstrate the suitability of RIM-Deep for various sample preparation protocols with clearing solutions of different refractive indices. They clearly demonstrate that the RIM-Deep chamber is compatible with all three methods. Brain samples are characterized by complex networks of cells and are often hard to visualize. Despite the dense, complex structure of brain tissue, the RIM-Deep method generated high-quality images of all three samples. As the authors stated, increasing imaging depth often goes hand in hand with purchasing expensive new equipment, exchanging several microscopy parts, or purchasing a new microscopy setup. Innovations like the RIM-Deep chamber might pave the way for cost-effective imaging and expand the applicability of inverted confocal microscopy.

      Weeknesses:

      (1) However, since this study introduces a novel imaging technique aiming to revolutionize imaging of large samples, additional control experiments would strengthen the data. From the three clearing protocols used (CUBIC, MACS, and iDISCO), only the brain section from Macaca fascicularis cleared with iDISCO was imaged with the standard chamber and the RIM-Deep method. This comparison indeed shows a more than 2-fold increase in imaging depth, a significant enhancement in microscopy. However, it would have been important to evaluate and show the imaging depth differences in the other two samples, as they were cleared with different protocols and treated with clearing solutions of different refractive indices compared to iDISCO.

      Thank you for your suggestion. We will investigate the imaging performance of brain tissue using the other two clearing protocols with both the official adapter and the RIM-deep method.

      (2) The description of the figures and figure panels should be improved for a better understanding of the experiments performed and the resulting images/data.

      Thank you for your suggestion. We will revise the figure legends in detail.

      (3) While the authors used a Nikon AX inverted laser scanning confocal microscope, the study would benefit from evaluating the performance of the RIM-Deep method using other inverted confocal microscopes or even wide-field microscopes.

      Thank you for your suggestion. We also recognize that evaluating the performance of the RIM-Deep method on other inverted confocal microscopes will help further validate its applicability and robustness. We will supplement these experiments to expand the scope and reliability of RIM-Deep.

    1. eLife Assessment

      This valuable study investigates how biologically plausible learning mechanisms can support assembly formation that encodes statistics of the environment, by enabling neural sampling that is based on within-assembly connectivity strength. It convincingly shows that assembly formation can emerge from predictive plasticity in excitatory synapses, while two types of plasticity in inhibitory synapses are required: inhibitory homeostatic (predictive) plasticity and inhibitory competitive (anti-predictive) plasticity.

    2. Reviewer #1 (Public review):

      The authors have successfully addressed most of the issues raised in the first review. Nevertheless, some of the mentioned problems require further attention, mostly regarding the formal derivation of the learning rules, as well as connections to previous research.

      Regarding the derivations of learning rules: The authors have provided Goal functions for each of the plastic neural connections to give some insight into what these connections do. However, as I understand, this does not address the main concern raised in the previous review: Why do these rules lead to overall network dynamics that sample from the input distribution? Virtually all other work on neural sampling that I am aware of (e.g., from Maass Lab, Lengyel Lab, etc.) start from a single goal function for all connections that somehow quantifies the difference of network dynamics from the target distribution. In the presented work the authors specify different goal functions for the different weights, which does not make clear how the desired network dynamics are ultimately achieved.

      This becomes especially evident looking at the two different recurrent connections (M and G). M minimizes the difference between network activity f and recurrent prediction DKL[f|phi(My)], but why is this alone not enough to ensure a good sampling? G minimizes the squared error [f-phi(Gy)]^2, but what does that mean? The problem is that the goal functions are self-consistent in the sense that both f and phi(Gy) depend on G, which makes an interpretation very difficult. Ultimately it's easier to interpret this by looking at the plasticity rule and see that it leads to a balance. For G the authors furthermore actually ignore the derived plasticity rule and switch to a rule similar to the one for M, meaning that the actual goal function for G is also something like DKL[f|phi(Gy)]. Overall, an overarching optimization goal for the entire network is missing, which makes the interpretation very difficult. I understand that this might be very difficult to provide at this stage, but the authors should at least point out this shortcoming as an open question for the proposed framework.

      Regarding the relation to previous work the authors have provided a lot more detailed discussion, which very much clears up the contributions and novel ideas in their work. Still, there are some claims that are not consistent with the literature. Especially, in lines 767 ff. the authors state that Kappel et al "assumed plasticity only at recurrent synapses projecting onto the excitatory neurons. In addition, unlike our model, the cell assembly memberships need to be preconfigured in the [...] model." This is not correct, as Kappel et al learn both the feed-forward and recurrent connections, hence the main difference is that in Kappel et al sampling is sequential and not random. This is why I mentioned this work in the first review, as it speaks against the authors claims of novelty (719 ff.), which should be adjusted accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      The paper reconsiders the formation of Hebbian-type assemblies, with their spontaneous reactivation representing the statistics of the sensory inputs, in the light of predictive synaptic plasticity. It convincingly shows that not all plasticity rules can be predictive in the narrow sense. While plasticity for the excitatory synapses (the forward projecting and recurrent ones) are predictive, two types of plasticity in the recurrent inhibition is required: a homeostatic and competitive one.

      Details:

      Besides the excitatory forward and recurrent connections that are learned based on predictive synaptic plasticity, two types of inhibitory plasticity are considered. A first type of inhibition is homeostatic and roughly balances excitation within the cell assemblies. Plasticity in this type 1 inhibition is also predictive, analogous to the plasticity of the excitatory synapses. However, plasticity in type 2 inhibition is competitive and has a switched sign. Both types of inhibitory plasticity, the predictive (homeostatic) and the anti-predictive (competitive) one, work together with the predictive excitatory plasticity to form cell assemblies representing sensory stimuli. Only if the two types of homeostatic and competitive inhibitory plasticity are present, will the spontaneous replay of the assemblies reflect the statistics of the stimulus presentation.

      Critical review:

      The simulations include Dale's law, making them more biologically realistic. The paper emphasizes predictive plasticity and introduces type 1 inhibitory plasticity that, by construction, tries to fully explain away the excitatory input. In the absence of external inputs, however, due to the symmetry between the excitatory and inhibitory-type-1 plasticity rules, excitation and inhibition tend to fully cancel each other. Multiple options may solve the dilemma:

      (1) As other predictive dendritic plasticity models assume, the presynaptic source for recurrent inhibition is typically less informative than the presynaptic source of excitation, so that inhibition is not able to fully explain away excitation.

      (2) Beside the inhibitory predictive plasticity that mirrors the analogous excitatory predictive plasticity, and additional competitive plasticity can be introduced.

      The paper chooses solution (2) and suggests and additional inhibitory recurrent pathway that is not predictive, but instead anti-predictive with a reversed sign. The combination of the two types of inhibitory plasticities lead to a stable formation of cell assemblies. The stable target activity of the plasticity rules in a memory recall is not anymore 0, as it would be with only type-1-inhibitory plasticity.<br /> Instead, the target activity of plasticity is now enhanced within a winning assembly, and also positive but reduced in the loosing assemblies.

    4. Reviewer #3 (Public review):

      Summary:

      The work shows how learned assembly structure and its influence on replay during spontaneous activity can reflect the statistics of stimulus input. In particular, stimuli that are more frequent during training elicit stronger wiring and more frequent activation during replay. Past works (Litwin-Kumar and Doiron, 2014; Zenke et al., 2015) have not addressed this specific question, as classic homeostatic mechanisms forced activity to be similar across all assemblies. Here, the authors use a dynamic gain and threshold mechanism to circumnavigate this issue and link this mechanism to a cellular monitoring of membrane potential history.

      Strengths:

      (1) This is an interesting advance, and the authors link this to experimental work in sensory learning in environments with non-uniform stimulus probabilities.

      (2) The authors consider their mechanism in a variety of models of increasing complexity (simple stimuli, complex stimuli; ignoring Dale's law, incorporating Dale's law).

      (3) Links a cellular mechanism of internal gain control (their variable h) to assembly formation and the non-uniformity of spontaneous replay activity. Offers a promise of relating cellular and synaptic plasticity mechanisms under a common goal of assembly formation.

      Weaknesses:

      (1) However, while the manuscript does show that assembly wiring does follow stimulus likelihood, it is not clear how the assembly specific statistics of h reflect these likelihoods. I find this to be a key issue.

      (2) The authors model does take advantage of the sigmoidal transfer function, and after learning an assembly is either fully active or near fully silent (Fig. 2a). This somewhat artificial saturation may be the reason that classic homeostasis is not required, since runaway activity is not as damaging to network activity.

      (3) Classic mechanisms of homeostatic regulation (synaptic scaling, inhibitory plasticity) try to ensure that firing rates match a target rate (on average). If the target rate is the same for all neurons then having elevated firing rates for one assembly compared to others during spontaneous activity would be difficult. If these homeostatic mechanisms were incorporated, how would they permit the elevated firing rates for assemblies that represent more likely stimuli?

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In their manuscript, the authors propose a learning scheme to enable spiking neurons to learn the appearance probability of inputs to the network. To this end, the neurons rely on error-based plasticity rules for feedforward and recurrent connections. The authors show that this enables the networks to spontaneously sample assembly activations according to the occurrence probability of the input patterns they respond to. They also show that the learning scheme could explain biases in decision-making, as observed in monkey experiments. While the task of neural sampling has been solved before in other models, the novelty here is the proposal that the main drivers of sampling are within-assembly connections, and not between-assembly (Markov chains) connections as in previous models. This could provide a new understanding of how spontaneous activity in the cortex is shaped by synaptic plasticity. 

      The manuscript is well written and the results are presented in a clear and understandable way. The main results are convincing, concerning the spontaneous firing rate dependence of assemblies on input probability, as well as the replication of biases in the decision-making experiment. Nevertheless, the manuscript and model leave open several important questions. The main problem is the unclarity, both in theory and intuitively, of how the sampling exactly works. This also makes it difficult to assess the claims of novelty the authors make, as it is not clear how their work relates to previous models of neural sampling. 

      We agree with the reviewer that our previous manuscript was not clear regarding the mechanism of the model. We have performed additional simulations and included a derivation of the learning rule to address this, which we explain below.

      Regarding the unclarity of the sampling mechanism, the authors state that withinassembly excitatory connections are responsible for activating the neurons according to stimulus probability. However, the intuition for this process is not made clear anywhere in the manuscript. How do the recurrent connections lead to the observed effect of sampling? How exactly do assemblies form from feedforward plasticity? This intuitive unclarity is accompanied by a lack of formal justification for the plasticity rules. The authors refer to a previous publication from the same lab, but it is difficult to connect these previous results and derivations to the current manuscript. The manuscript should include a clear derivation of the learning rules, as well as an (ideally formal) intuition of how this leads to the sampling dynamics in the simulation. 

      We have included a derivation of our plasticity rules in lines 871-919 in the revised manuscript. Consistent with our claim that predictive plasticity updates the feedforward and the recurrent synapses to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy among the recurrent prediction, feedforward prediction, and the output firing rate. The resultant feedforward plasticity is the same with our previous rule (Asabuki and Fukai, 2020), which segments the salient patterns embedded in the input sequence. The recurrent plasticity rule suggests that the recurrent prediction learns the statistical model of the evoked activity, enabling the network to replay the learned internal model.  

      Similarly, for the inhibitory plasticity, we defined a cost function that evaluates the difference between the firing rate and inhibitory potential within each neuron. This rule is crucial for maintaining balanced network dynamics. See our response below for more details on the role of inhibitory plasticity.

      Some of the model details should furthermore be cleared up. First, recurrent connections transmit signals instantaneously, which is implausible. Is this required, would the network dynamics change significantly if, e.g., excitation arrives slightly delayed? Second, why is the homeostasis on h required for replay? The authors show that without it the probabilities of sampling are not matched, but it is not clear why, nor how homeostasis prevents this. Third, G and M have the same plasticity rule except for G being confined to positive values, but there is no formal justification given for this quite unusual rule. The authors should clearly justify (ideally formally) the introduction of these inhibitory weights G, which is also where the manuscript deviates from their previous 2020 work. My feeling is that inhibitory weights have to be constrained in the current model because they have a different goal (decorrelation, not prediction) and thus should operate with a completely different plasticity mechanism. The current manuscript doesn't address this, as there is no overall formal justification for the learning algorithm. 

      First, while the reviewer's suggestion to test with delayed excitation is intriguing and crucial for a more biologically detailed spiking neuron model, we have chosen to maintain the current model configuration. Our use of Poisson spiking neurons, which generate spikes based on instantaneous firing rates, does not heavily depend on precise spike timing information. Therefore, to preserve the simplicity of our results, we kept the model unchanged.

      Second, we agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b in the revised manuscript, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we have revised our claim in the manuscript to clarify that the memory trace is primarily critical for firing rate homeostasis, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      Third, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in decorrelation and prediction, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll.560-593 in the revised manuscript.

      Finally, the authors should make the relation to previous models of sampling and error-based plasticity more clear. Since there is no formal derivation of the sampling dynamics, it is difficult to assess how they differ exactly from previous (Markov-based) approaches, which should be made more precise. Especially, it would be important to have concrete (ideally experimentally testable) predictions on how these two ideas differ. As a side note, especially in the introduction (line 90), this unclarity about the sampling made it difficult to understand the contrast to Markovian transition models. 

      As the reviewer pointed out, previous computational models have demonstrated that recurrent networks with Hebbian-like plasticity can learn appropriate Markovian statistics (Kappel et al., 2014; Asabuki and Clopath, 2024). However, our model differs conceptually from these previous models. While Kappel et al. showed that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key difference with our model is that their neural representations acquire sequences using Markovian sampling dynamics, whereas our model does not depend on such ordered sampling. Specifically, in their model, sequential sampling arises from learned structures in the off-diagonal elements of the recurrent connections (i.e., between-assembly connections). In contrast, our network learns to stochastically generate recurrent cell assemblies by relying solely on within-assembly connections. A similar argument can be made for Asabuki and Clopath paper as well. Further, while our model introduced plasticity rule for all types of connections, Asabuki and Clopath paper introduced plasticity for recurrent synapses projecting on the excitatory neurons only and the cell assembly memberships were preconfigured unlike our model. We have added additional clarifying sentences in ll. 757-772 of the revised manuscript to elaborate on this point.

      There are also several related models that have not been mentioned and should be discussed. In 663 ff. the authors discuss the contributions of their model which they claim are novel, but in Kappel et al (STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning) similar elements seem to exist as well, and the difference should be clarified. There is also a range of other models with lateral inhibition that make use of error-based plasticity (most recently reviewed in Mikulasch et al, Where is the error? Hierarchical predictive coding through dendritic error computation), and it should be discussed how the proposed model differs from these. 

      We have clarified the difference from previously proposed recurrent network model to perform Markovian sampling. Please see our reply above.

      We have also included additional sentence in ll. 704-709 in the revised manuscript to discuss how our model differs from similar predictive learning models: “It should be noted that while several network models that perform errorbased computations like ours exploit only inhibitory recurrent plasticity (Mikulasch et al., 2021; Mackwood et al., 2021; Hertäg and Clopath., 2022; Mikulasch et al., 2023), our model learns the structured spontaneous activity to reproduce the evoked statistics by modifying both excitatory and inhibitory recurrent connections.”

      Reviewer #2 (Public Review):

      Summary: 

      The paper considers a recurrent network with neurons driven by external input. During the external stimulation predictive synaptic plasticity adapts the forward and recurrent weights. It is shown that after the presentation of constant stimuli, the network spontaneously samples the states imposed by these stimuli. The probability of sampling stimulus x^(i) is proportional to the relative frequency of presenting stimulus x^(i) among all stimuli i=1,..., 5. 

      Methods: 

      Neuronal dynamics: 

      For the main simulation (Figure 3), the network had 500 neurons, and 5 nonoverlapping stimuli with each activating 100 different neurons where presented. The voltage u of the neurons is driven by the forward weights W via input rates x, the inhibitory recurrent weights G, are restricted to have non-negative weights (Dale's law), and the other recurrent weights M had no sign-restrictions. Neurons were spiking with an instantaneous Poisson firing rate, and each spike-triggered an exponentially decaying postsynaptic voltage deflection. Neglecting time constants of the postsynaptic responses, the expected postsynaptic voltage reads (in vectorial form) as 

      u = W x + (M - G) f (Eq. 5) 

      where f =; phi(u) represents the instantaneous Poisson rate, and phi a sigmoidal nonlinearity. The rate f is only an approximation (symbolized by =;) of phi(u) since an additional regularization variable h enters (taken up in Point 4 below). The initialisation of W and M is Gaussian with mean 0 and variance 1/sqrt(N), N the number of neurons in the network. The initial entries of G are all set to 1/sqrt(N). 

      Predictive synaptic plasticity: 

      The 3 types of synapses were each adapted so that they individually predict the postsynaptic firing rate f, in matrix form 

      ΔW ≈ (f - phi( W x ) ) x^T 

      ΔM ≈ (f - phi( M f ) ) f^T 

      ΔG ≈ (f - phi( M f ) ) f^T but confined to non-negative values of G (Dale's law). 

      The ^T tells us to take the transpose, and the ≈ again refers to the fact that the ϕ entering in the learning rule is not exactly the ϕ determining the rate, only up to the regularization (see Point 4). 

      Main formal result: 

      As the authors explain, the forward weight W and the unconstrained weight M develop such that, in expectations, 

      f =; phi( W x ) =; phi( M f ) =; phi( G f ) , 

      consistent with the above plasticity rules. Some elements of M remain negative. In this final state, the network displays the behaviour as explained in the summary. 

      Major issues: 

      Point 1: Conceptual inconsistency 

      The main results seem to arise from unilaterally applying Dale's law only to the inhibitory recurrent synapses G, but not to the excitatory recurrent synapses M. 

      In fact, if the same non-negativity restriction were also imposed on M (as it is on G), then their learning rules would become identical, likely leading to M=G. But in this case, the network becomes purely forward, u = W x, and no spontaneous recall would arise. Of course, this should be checked in simulations. 

      Because Dale's law was only applied to G, however, M and G cannot become equal, and the remaining differences seem to cause the effect. 

      Predictive learning rules are certainly powerful, and it is reasonable to consider the same type of error-correcting predictive learning rule, for instance for different dendritic branches that both should predict the somatic activity. Or one may postulate the same type of error-correcting predictive plasticity for inhibitory and excitatory synapses, but then the presynaptic neurons should not be identical, as it is assumed here. Both these types of error-correcting and error-forming learning rules for same-branches and inhibitory/excitatory inputs have been considered already (but with inhibitory input being itself restricted to local input, for instance). 

      The model presented above lacked biological plausibility in several key aspects. Specifically, we assumed that the recurrent connection M could change sign through plasticity and be either excitatory or inhibitory, while the inhibitory connection G was restricted to being inhibitory only. This initial setting does not reflect the biological constraint that synapses typically maintain a consistent excitatory or inhibitory type. Furthermore, due to this unconstrained recurrent connectivity M, the original model had two types of inhibitory connections (i.e., the negative part of M and the inhibitory connection G) without providing a clear computational role for each type of inhibition.

      To address these limitations and to understand the role of the two types of inhibition, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in prediction and decorrelation, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll. 561593 in the revised manuscript.

      Point 2: Main result as an artefact of an inconsistently applied Dale's law? 

      The main result shows that the probability of a spontaneous recall for the 5 nonoverlapping stimuli is proportional to the relative time the stimulus was presented. This is roughly explained as follows: each stimulus pushes the activity from 0 up towards f =; phi( W x ) by the learning rule (roughly). Because the mean weights W are initialized to 0, a stimulus that is presented longer will have more time to push W up so that positive firing rates are reached (assuming x is non-negative). The recurrent weights M learn to reproduce these firing rates too, while the plasticity in G tries to prevent that (by its negative sign, but with the restriction to non-negative values). Stimuli that are presented more often, on average, will have more time to reach the positive target and hence will form a stronger and wider attractor. In spontaneous recall, the size of the attractor reflects the time of the stimulus presentation. This mechanism so far is fine, but the only problem is that it is based on restricting G, but not M, to non-negative values. 

      As mentioned above, we have included an additional simulation where all weights are non-negative. We have demonstrated the new results in Figure 6 before presenting the two-population model in the revised manuscript (Figure 7), so that readers can follow the importance of two pathways of inhibitory connections.

      Point 3: Comparison of rates between stimulation and recall. 

      The firing rates with external stimulations will be considerably larger than during replay (unless the rates are saturated). 

      This is a prediction that should be tested in simulations. In fact, since the voltage roughly reads as  u = W x + (M - G) f,  and the learning rules are such that eventually M =; G, the recurrences roughly cancel and the voltage is mainly driven by the external input x. In the state of spontaneous activity without external drive, one has  u = (M - G) f ,  and this should generate considerably smaller instantaneous rates f =; phi(u) than in the case of the feedforward drive (unless f is in both cases at the upper or lower ceiling of phi). This is a prediction that can also be tested. 

      Because the figures mostly show activity ratios or normalized activities, it was not possible for me to check this hypothesis with the current figures. So please show non-normalized activities for comparing stimulation and recall for the same patterns. 

      We agree with the reviewer that the activity levels of spontaneous and induced activity should be compared. We have shown the distributions of activity level of these activities in our new Figure 2d. As expected, we found that the evoked activity showed stronger activity compared to the spontaneous activity.  

      Point 4: Unclear definition of the variable h. 

      The formal definition of h = hi is given by (suppressing here the neuron index i and the h-index of tau) 

      tau dh/dt = -h if h>u, (Eq. 10)  h = u otherwise. 

      But if it is only Equation 10 (nothing else is said), h will always become equal to u, or will vanish, i.e. either h=u or h=0 after some initial transient. In fact, as soon as h>u, h is decaying to 0 according to the first line. If u is >0, then it stops at u=h according to the second line. No reason to change h=u further. If u<=0 while h>u, then h is converging to 0 according to the first line and will stay there. I guess the authors had issues with the recurrent spiking simulations and tried to fix this with some regularization. However as presented, it does not become clear how their regulation works. 

      We apologize for the reviewer that our definition of h was unclear. As the reviewer pointed out, since the memory trace is always positive and larger than (or equal to) the membrane potential, it is possible that the membrane potential becomes always negative and the memory trace reach to 0 constantly. However, since the network is always balanced between excitatory and inhibitory inputs, and it does not happen that the membrane potential always diverges negatively. In fact, we trained without any manipulations other than the memory trace described in the manuscript, and the network was able to learn the assembly structure stably. 

      BTW: In Eq. 11 the authors set the gain beta to beta = beta0/h which could become infinite and, putatively more problematic, negative, depending on the value of h. Maybe some remark would convince a reader that no issues emerge from this. 

      We have mentioned in ll. 864-866 in the revised manuscript that no issues emerge from the slope parameter.

      Added from discussions with the editor and the other reviewers: 

      Thanks for alerting me to this Supplementary Figure 8. Yes, it looks like the authors did apply there Dale's law for both the excitatory and inhibitory synapses. Yet, they also introduced two types of inhibitory pathways converging both to the excitatory and inhibitory neurons. For me, this is a confirmation that applying Dale's law to both excitatory and inhibitory synapses, with identical learning rules as explained in the main part of the paper, does not work. 

      Adding such two pathways is a strong change from the original model as introduced before, and based on which all the Figures in the main text are based. Supplementary Figure 8 should come with an analysis of why a single inhibitory pathway does not work. I guess I gave the reason in my Points 1-3. Some form of symmetry breaking between the recurrent excitation and recurrent inhibition is required so that, eventually, the recurrent excitatory connection will dominate. 

      Making the inhibitory plasticity less expressive by applying Dale's law to only those inhibitory synapses seems to be the answer chosen in the Figures of the main text (but then the criticism of unilaterally applying Dale's law). 

      Applying Dale's law to both types of synapses, but dividing the labor of inhibition into two strictly separate and asymmetric pathways, and hence asymmetric development of excitatory and inhibitory weights, seems to be another option. However, introducing such two separate inhibitory pathways, just to rescue the fact that Dale's law is applied to both types of synapses, is a bold assumption. Is there some biological evidence of such two pathways in the inhibitory, but not the excitatory connections? And what is the computational reasoning to have such a separation, apart from some form of symmetry breaking between excitation and inhibition? I guess, simpler solutions could be found, for instance by breaking the symmetry between the plasticity rules for the excitatory and inhibitory neurons. All these questions, in my view, need to be addressed to give some insights into why the simulations do work. 

      The reviewer’s intuition is correct. To effectively learn cell assembly structures and replay their activities, our model indeed requires two types of inhibitory connections. Please refer to our response above for further details. 

      Overall, Supplementary Figure 8 seems to me too important to be deferred to the Supplement. The reasoning behind the two inhibitory pathways should appear more prominently in the main text. Without this, important questions remain. For instance, when thinking in a rate-based framework, the two inhibitory pathways twice try to explain the somatic firing rate away. Doesn't this lead to a too strong inhibition? Can some steady state with a positive firing rate caused by the recurrence, in the absence of an external drive, be proven? The argument must include the separation into Path 1 and Path 2. So far, this reasoning has not been entered. 

      In fact, it might be that, in a spiking implementation, some sparse spikes will survive. I wonder whether at least some of these spikes survive because of the other rescuing construction with the dynamic variable h (Equation 10, which is not transparent, and that is not taken up in the reasoning either, see my Point 4)

      Perhaps it is helpful for the authors to add this text in the reply to them. 

      We have moved the former Supplemental Figure 8 to the main Figure 7. Please see our response above about the role of dual inhibitory connection types.

      Reviewer #3 (Public Review): 

      Summary: 

      The work shows how learned assembly structure and its influence on replay during spontaneous activity can reflect the statistics of stimulus input. In particular, stimuli that are more frequent during training elicit stronger wiring and more frequent activation during replay. Past works (Litwin-Kumar and Doiron, 2014; Zenke et al., 2015) have not addressed this specific question, as classic homeostatic mechanisms forced activity to be similar across all assemblies. Here, the authors use a dynamic gain and threshold mechanism to circumnavigate this issue and link this mechanism to cellular monitoring of membrane potential history. 

      Strengths: 

      (1) This is an interesting advance, and the authors link this to experimental work in sensory learning in environments with non-uniform stimulus probabilities. 

      (2) The authors consider their mechanism in a variety of models of increasing complexity (simple stimuli, complex stimuli; ignoring Dale's law, incorporating Dale's law). 

      (3) Links a cellular mechanism of internal gain control (their variable h) to assembly formation and the non-uniformity of spontaneous replay activity. Offers a promise of relating cellular and synaptic plasticity mechanisms under a common goal of assembly formation. 

      Weaknesses: 

      (1) However, while the manuscript does show that assembly wiring does follow stimulus likelihood, it is not clear how the assembly-specific statistics of h reflect these likelihoods. I find this to be a key issue. 

      We agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we revised our claim in the manuscript to clarify that the memory trace is primarily critical for learning to avoid trivial solutions, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      (2) The authors' model does take advantage of the sigmoidal transfer function, and after learning an assembly is either fully active or nearly fully silent (Figure 2a). This somewhat artificial saturation may be the reason that classic homeostasis is not required since runaway activity is not as damaging to network activity. 

      The reviewer's intuition is correct. The saturating nonlinearity is important for the network to form stable assembly structures. We have added an additional sentence in ll. 866-868 to mention this.

      (3) Classic mechanisms of homeostatic regulation (synaptic scaling, inhibitory plasticity) try to ensure that firing rates match a target rate (on average). If the target rate is the same for all neurons then having elevated firing rates for one assembly compared to others during spontaneous activity would be difficult. If these homeostatic mechanisms were incorporated, how would they permit the elevated firing rates for assemblies that represent more likely stimuli? 

      LIF neurons) may solve this problem by utilizing spike-timing statistics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Minor issues: 

      Figure 1: It would be helpful to display the equation for output rate here as well. 

      We have included the equation in the revised Figure 1a.

      Figure 3c: Typo "indivisual neurons". 

      We have modified the typo. We thank the reviewer for their careful review.

      Line 325: Do you mean Figure 3f,g? 

      We repeated the task with different numbers of stimuli in Supplementary Figure 1c,d.

      Line 398: Winner-take-all can be misunderstood, as it typically stands for competition in inference, not in learning. 

      We have rephrased it as “unstable dynamics” in l. 400

      Line 429: Are intra-assembly and within-assembly the same? If so please use consistent terminology. 

      We have made the terminology consistent.

      Line 792 ff.: Please mention that (t) was left away. 

      We have included a sentence to mention it in ll. 847-848 in the revised manuscript.

      Line 817: Should u_i be v_i? 

      We have modified the term.

      Methods: What is the value of tau_h? 

      We have used 𝜏! \=10 s, which is mentioned in l. 853

    1. eLife Assessment

      This timely and important study used functional near-infrared spectroscopy hyperscanning to examine the neural correlates of how group identification influences collective behavior. The work provides incomplete evidence to indicate that the synchronization of brain activity between different people underlies collective performance and that changes in brain activity patterns within individuals may, in turn, underlie this between-person synchrony. This study will be of interest to researchers investigating the neuroscience of social behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this article have presented a timely and well-written study exploring the impact of group identification on collective behaviors and performance. The breadth of analyses is impressive and contributes significantly to our understanding of the collective performance. However, there are several areas where further clarification and revision would strengthen the study.

      Strengths:

      (1) Timeliness and Relevance:<br /> The topic is highly relevant, particularly in today's interconnected and team-oriented work environments. Triadic hyperscanning is important to understand group dynamics, but most previous work has been limited to dyadic work.

      (2) Comprehensive Analysis:<br /> The authors have conducted extensive analyses, offering valuable insights into how group identification affects collective behaviors.

      (3) Clear Writing:<br /> The manuscript is well-written and easy to follow, making complex concepts accessible.

      Weaknesses (clarifications needed):

      (1) Experimental Design:<br /> The study does not mention whether the authors examined sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers). Including these variables could provide a more nuanced understanding of group dynamics.

      (2) fNIRS Data Acquisition:<br /> The authors' approach to addressing individual differences in anatomy is lacking in detail. Understanding how they identified the optimal channels for synchrony between participants would be beneficial. Was this done by averaging to find the location with the highest coherence?

      (3) Behavioral Analysis:<br /> For group identification, the analysis currently uses a dichotomous approach. Introducing a regression model to capture the degree of identification could offer more granular insights into how varying levels of group identification affect collective behavior and performance.

      (4) Single Brain Activation Analysis:<br /> The application of the General Linear Model (GLM) is unclear, particularly given the long block durations and absence of multiple trials. Further explanation is needed on how the GLM was implemented under these conditions.

      (5) Within-group neural Synchrony (GNS) Calculation:<br /> The method for calculating GNS could be improved by using mutual information instead of pairwise summation, as suggested by Xie et al. (2020) in their study on fMRI triadic hyperscanning. Additionally, the explanation of GNS calculation is inconsistent. At one point, it is mentioned that GNS was averaged across time and channels, while elsewhere, it is stated that channels with the highest GNS were selected. Clarification on this point is essential.

      (6) Placement of fNIRS Probes:<br /> The probes were only placed in the frontal regions, despite literature suggesting that the superior temporal sulcus (STS) and temporoparietal junction (TPJ) regions are crucial for triadic team performance. A justification for this choice or inclusion of these regions in future studies would be beneficial.

      (7) Interpretation of fNIRS Data:<br /> Given that fNIRS signals are slow, similar to BOLD signals in fMRI, the interpretation of Figure 6 raises concerns. It suggests that it takes several minutes (on the order of 4-5 minutes) for people to collaborate, which seems implausible. More context or re-evaluation of this interpretation is needed.

    3. Reviewer #2 (Public review):

      Summary:

      This study primarily aims to examine the relationship between collective performance and group identification. Additionally, the authors propose that inter-brain synchronization (IBS) underlies collective performance and that changes in intra-brain functional connectivity or single-brain activation may, in turn, underlie IBS. The topic addressed in this paper is of great importance in the field using hyperscanning. However, the details of the experiments and analysis described in the paper are unclear, and the hypothesis as to why IBS is thought to underlie collective performance is not clearly presented. In addition, some of the analysis seems to be inappropriate.

      Strengths:

      I find the model presented in Figure 7 to be intriguing. Understanding why inter-brain synchronization occurs and how it is supported by specific single-brain activations or intra-brain functional connectivity is indeed a critical area for researchers conducting hyperscanning studies to explore.

      Understanding triadic-interaction is really important, while almost all hyperscanning neuroimaging focuses on the dyadic interaction. The exploring neural/behavioral/psychological basis behind triadic interaction is a promising method for understanding collective behavior and decision-making.

      Weaknesses:

      The authors need to clearly articulate their hypothesis regarding why neural synchronization occurs during social interaction. For example, in line 284, it is stated that "It is plausible that neural synchronization is closely associated with group identification and collective performance...", but this is far from self-evident. Neural synchronization can occur even when people are merely watching a movie (Hasson et al., 2004), and movie-watchers are not engaged in collective behavior. There is no direct link between the IBS and collective behavior. The authors should explain why they believe inter-brain synchronization occurs in interactive settings and why they think it is related to collective behavior/performance.

      The authors state that "GNS in the OFC was a reliable neuromarker, indicating the influence of group identification on collective performance," but this claim is too strong. Please refer to Figure 4B. Do the authors really believe that collective performance can be predicted given the correlation with the large variance shown? There is a significant discrepancy between observing a correlation between two variables and asserting that one variable is a predictive biomarker for the other.

      Why are the individual answers being analyzed as collective performance (See, L-184)? Although these are performances that emerge after the group discussion, they seem to be individual performances rather than collective ones. Typically, wouldn't the result of a consensus be considered a collective performance? The authors should clarify why the individual's answer is being treated as the measure of collective performance.

      Performing SPM-based mapping followed by conducting a t-test on the channels within statistically significant regions constitutes double dipping, which is not an acceptable method (Kriegeskorte et al., 2011). This issue is evident in, for example, Figures 3A and 4A.

      Please refer to the following source:<br /> https://www.nature.com/articles/nn.2303

      In several key analyses within this study (e.g., single-brain activation in the paragraph starting from L398, neural synchronization in the paragraph starting from L393), the TPJ is mentioned alongside the DLPFC. However, in subsequent detailed analyses, the TPJ is entirely ignored.

      The method for analyzing single-brain activation is unclear. Although it is mentioned that GLM (generalized linear model) was used, it is not specified what regressors were prepared, nor which regressor's β-values are reported as brain activity. Without this information, it is difficult to assess the validity of the reported results.

      While the model illustrated in Figure 7 seems to be interesting, for me, it seems not to be based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented.

      The details of the experiment are not described at all. While I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study.

    4. Author response:

      We are appreciative of the editors’ and reviewers’ positive comments and constructive suggestions, which will help us to improve our manuscript. We will make changes as required by the reviewers. Our primary focus will be on revising and clarifying certain aspects:

      First, recent research has revealed a strong correlation between brain synchronization and group decision-making, a key neural marker. We aim to bolster our hypothesis by reviewing additional literature, ensuring accuracy in terminology and appropriateness in phrasing.

      Second, it is crucial to note that we will include additional methodological details, such as the details of the experiment, the significance of individual difference variables, and the details of the data analyses.

      Third, despite introducing a novel perspective in our study, we acknowledge the utilization of the conventional fNIRS hyperscanning analyses, which are widely accepted within the research community. Our methodology entails the identification of significant channels via one-sample t-tests, subsequently complemented by either ANOVAs or independent sample t-tests, without the need for double dipping.

      We will address all the issues raised by the reviewers.We believe that the manuscript will significantly benefit from the insightful suggestions and invaluable contributions made by the editors and reviewers.

    1. eLife Assessment

      This important study challenges conventional life-history theory by demonstrating that reproductive-survival trade-offs are minimal in birds, except when reproductive effort is experimentally exaggerated. The evidence is solid, drawing from a meta-analysis of over 30 bird species, and effectively separates the effects of individual quality from reproductive costs. The findings will be of broad interest to evolutionary biologists and ecologists studying life-history trade-offs and reproductive strategies.

    2. Reviewer #4 (Public review):

      Summary:

      This is an important study that underscores that reproduction-survival trade-offs are not manifested (contrary to what generally accepted theory predicts) across a range of studies on birds. This has been studied by a meta-analytical approach, gathering data from a set of 46 papers (30 bird species). The overall conclusion is that there are no trade-offs apparent unless experimental manipulations push the natural variability to extreme values. In the wild, the general pattern for within-species variation is that birds with (naturally) larger clutches survive better.

      Strengths:

      I agree this study highlights important issues and provides good evidence of what it claims, using appropriate methods.

      Weaknesses:

      I also think, however, that it would benefit from broadening its horizon beyond bird studies. The conclusions can be reinforced through insights from other taxa. General reasoning is that there is positive pleiotropy (i.e. individuals vary in quality and therefore some are more fit (perform better) than others. Of course, this is within their current environment (biotic, abiotic, social. ...), with consequences of maintaining genetic variation across generations - outlined in Maklakov et al. 2015 (https://doi.org/10.1002/bies.201500025). This explains the outcomes of this study very well and would come to less controversy and surprise for a more general audience.

      I have two fish examples in my mind where this trade-off is also discounted. Of course, given that it is beyond brood-caring birds, the wording in those studies is slightly different, but the evolutionary insight is the same. First, within species but across populations, Reznick et al. (2004, DOI: 10.1038/nature02936) demonstrated a positive correlation between reproduction and parental survival in guppies. Second, an annual killifish study (2021, DOI: 10.1111/1365-2656.13382) showed, within a population, a positive association between reproduction and (reproductive) aging.

      In fruit flies, there is also a strong experimental study demonstrating the absence of reproduction-lifespan trade-offs (DOI: 10.1016/j.cub.2013.09.049).

      I suggest that incorporating insights from those studies would broaden the scope and reach of the current manuscript.

      Likely impact:

      I think this is an important contribution to a slow shift in how we perceive the importance of trade-offs in ecology and evolution in general. While the current view still is that one individual excelling in one measure of its life history (i.e. receiving benefits) must struggle (i.e. pay costs) in another part. However, a positive correlation between all aspects of life history traits is possible within an individual (such as due to developmental conditions or fitting to a particular environment). Simply, some individuals can perform generally better (be of good quality than others).

    3. Author response:

      The following is the authors’ response to the previous reviews.

      In the second round of reviews, Reviewer 2 made three specific comments. The first comment criticises us for not including a set of equations they had requested in their first review. We did, in fact, include the requested equations in our revised submission, which were in the Supplementary Information, and were also cited in the main text of our revised manuscript and our changes were made clear in our response to the reviewer. The second comment, the reviewer suggested adding one word to a sentence in the abstract. We have made this change (line 23). The third comment, the reviewer highlights a sentence where we agree we could have been more clear. The sentence can be rectified by adding one word to the current sentence, which we have done (line 232). We believe the changes required to our manuscript are very minor, and we have implemented these two suggested changes, which are highlighted in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      In this study, Nishi et al. claim that the ratio of long-term hematopoietic stem cell (LT-HSC) versus short-term HSC (ST-HSC) determines the lineage output of HSCs and reduced ratio of ST-HSC in aged mice causes myeloid-biased hematopoiesis. The authors used Hoxb5 reporter mice to isolate LT-HSC and ST-HSC and performed molecular analyses and transplantation assays to support their arguments. How the hematopoietic system becomes myeloid-biased upon aging is an important question with many implications in the disease context as well. However, their study is descriptive with remaining questions.

      Weaknesses:

      Comment #1-1: The authors may need conceptual re-framing of their main argument because whether the ST-HSCs used in this study are functionally indeed short-term "HSCs" is questionable. The data presented in this study and their immunophenotypic definition of ST-HSCs (Lineage negative/Sca-1+/c-Kit+/Flk2-/CD34-/CD150+/Hoxb5-) suggest that authors may find hematopoietic stem cell-like lymphoid progenitors as previously shown for megakaryocyte lineage (Haas et al., Cell stem cell. 2015) or, as the authors briefly mentioned in the discussion, Hoxb5- HSCs could be lymphoid-biased HSCs.

      The authors disputed the idea that Hoxb5- HSCs as lymphoid-biased HSCs based on their previous 4 weeks post-transplantation data (Chen et al., 2016). However, they overlooked the possibility of myeloid reprogramming of lymphoid-biased population during regenerative conditions (Pietras et al., Cell stem cell., 2015). In other words, early post-transplant STHSCs (Hoxb5- HSCs) can be seen as lacking the phenotypic lymphoid-biased HSCs.

      Thinking of their ST-HSCs as hematopoietic stem cell-like lymphoid progenitors or lymphoidbiased HSCs makes more sense conceptually as well.

      Response #1-1: We appreciate this important suggestion and recognize the significance of the debate on whether Hoxb5- HSCs are ST-HSCs or lymphoid-biased HSCs.

      HSCs are defined by their ability to retain hematopoietic potential after a secondary transplantation1-2. If Hoxb5- HSCs were indeed lymphoid-biased HSCs, they would exhibit predominantly lymphoid hematopoiesis even after secondary transplantation. However, functional experiments demonstrate that these cells lose their hematopoietic output after secondary transplantation3 (see Fig. 2 in this paper). Based on the established definition of HSCs in this filed, it is appropriate to classify Hoxb5- HSCs as ST-HSCs rather than lymphoid-biased HSCs.

      Additionally, it has been reported that myeloid reprogramming may occur in the early posttransplant period, around 2-4 weeks after transplantation, even in lymphoid-biased populations within the MPP fraction, due to high inflammatory conditions4. However, when considering the post-transplant hematopoiesis of Hoxb5- HSC fractions as ST-HSCs, they exhibit almost the same myeloid hematopoietic potential as LT-HSCs not only during the early 4 weeks after transplantation but also at 8 weeks post-transplantation3, when the acute inflammatory response has largely subsided. Therefore, it is difficult to attribute the myeloid production by ST-HSCs post-transplant solely to myeloid reprogramming.

      References

      (1) Morrison, S. J. & Weissman, I. L. The long-term repopulating subset of hematopoietic stem cells is deterministic and isolatable by phenotype. Immunity 1, 661–673 (1994).

      (2) Challen, G. A., Boles, N., Lin, K. K. Y. & Goodell, M. A. Mouse hematopoietic stem cell identification and analysis. Cytom. Part A 75, 14–24 (2009).

      (3) Chen, J. Y. et al. Hoxb5 marks long-term haematopoietic stem cells and reveals a homogenous perivascular niche. Nature 530, 223–227 (2016).

      (4) Pietras, E. M. et al. Functionally Distinct Subsets of Lineage-Biased Multipotent Progenitors Control Blood Production in Normal and Regenerative Conditions. Cell Stem Cell 17, 35–46 (2015).

      Comment #1-2: ST-HSCs come from LT-HSCs and further differentiate into lineage-biased multipotent progenitor (MPP) populations including myeloid-biased MPP2 and MPP3. Based on the authors' claim, LT-HSCs (Hoxb5- HSCs) have no lineage bias even in aged mice. Then these LT-HSCs make ST-HSCs, which produce mostly memory T cells. These memory T cell-producing ST-HSCs then produce MPPs including myeloid-biased MPP2 and MPP3.

      This differentiation trajectory is hard to accept. If we think Hoxb5- HSCs (ST-HSCs by authors) as a sub-population of immunophenotypic HSCs with lymphoid lineage bias or hematopoietic stem cell-like lymphoid progenitors, the differentiation trajectory has no flaw.

      Response #1-2: Thank you for this comment, and we apologize for the misunderstanding regarding the predominance of memory T cells in ST-HSCs after transplantation. 

      Our data show that ST-HSCs are not biased HSCs that predominantly produce memory T cells, but rather, ST-HSCs are multipotent hematopoietic cells. ST-HSCs lose their ability to self-renew within a short period, resulting in the cessation of ST-HSC-derived hematopoiesis. As a result, myeloid lineage with a short half-life disappears from the peripheral blood, and memory lymphocytes with a long half-life remain (see Figure 5 in this paper). 

      Comment #1-3: Authors' experimental designs have some caveats to support their claims. Authors claimed that aged LT-HSCs have no myeloid-biased clone expansion using transplantation assays. In these experiments, authors used 10 HSCs and young mice as recipients. Given the huge expansion of old HSC by number and known heterogeneity in immunophenotypically defined HSC populations, it is questionable how 10 out of so many old HSCs can faithfully represent the old HSC population. The Hoxb5+ old HSC primary and secondary recipient mice data (Figure 2C and D) support this concern. In addition, they only used young recipients. Considering the importance of the inflammatory aged niche in the myeloid-biased lineage output, transplanting young vs old LT-HSCs into aged mice will complete the whole picture.

      Response #1-3: We appreciate the reviewer for the comments. We acknowledge that using ten HSCs may not capture the heterogeneity of aging HSCs.

      However, although most of our experiments have used a small number of transplanted cells (e.g., 10 cells), we have conducted functional experiments across Figures 2, 3, 5, 6, S3, and S6, totaling n = 126, equivalent to over 1260 cells. Previous studies have reported that myeloid-biased HSCs constitute more than 50% of the aged HSC population1-2. If myeloidbiased HSCs increase with age, they should be detectable in our experiments. Our functional experiments have consistently shown that Hoxb5+ HSCs exhibit unchanged lineage output throughout life. In contrast, the data presented in this paper indicate that changes in the ratio of LT-HSCs and ST-HSCs may contribute to myeloid-biased hematopoiesis.

      We believe that transplanting aged HSCs into aged recipient mice is crucial to analyzing not only the differentiation potential of aged HSCs but also the changes in their engraftment and self-renewal abilities. We aim to clarify further findings through these experiments in the future.

      References

      (1) Dykstra B, Olthof S, Schreuder J, Ritsema M, Haan G De. Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells. J Exp Med. 2011 Dec 19;208(13):2691–703. 

      (2) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      Comment #1-4: The authors' molecular data analyses need more rigor with unbiased approaches. They claimed that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid or lymphoid gene set enrichment but aged bulk HSCs, which are just a sum of LT-HSCs and ST-HSCs by their gating scheme (Figure 4A), showed the "tendency" of enrichment of myeloid-related genes based on the selected gene set (Figure 4D). Although the proportion of ST-HSCs is reduced in bulk HSCs upon aging, since ST-HSCs do not exhibit lymphoid gene set enrichment based on their data, it is hard to understand how aged bulk HSCs have more myeloid gene set enrichment compared to young bulk HSCs. This bulk HSC data rather suggests that there could be a trend toward certain lineage bias (although not significant) in aged LT-HSCs or ST-HSCs. The authors need to verify the molecular lineage priming of LT-HSCs and ST-HSCs using another comprehensive dataset.

      Response #1-4: Thank you for pointing out that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid

      or lymphoid gene set enrichment, although aged bulk HSCs showed a tendency towards enrichment of myeloid-related genes.

      The actual GSEA result had an FDR > 0.05. Therefore, we cannot claim that bulk HSCs showed significant enrichment of myeloid-related genes with age. Consequently, we have revised the following sentences:

      [P11, L251] Neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid/lymphoid gene set enrichment, while shared myeloid-related genes tended to be enriched in aged bulk-HSCs, although this enrichment was not statistically significant (Fig. 4, F and G).

      In addition to the above, we also found that the GSEA results differ among myeloid gene sets (Fig. 4, D-F; Fig. 4S, C-D). These findings suggest that discussing lineage bias in HSCs using GSEA is challenging. We believe that functional experimental data is crucial. From our functional experiments, when the ratio of LT-HSC to ST-HSC was reconstituted to match the ratio in young Bulk-HSCs (LT= 2:8) or aged bulk-HSCs (LT= 5:5), myeloid-biased hematopoiesis was observed with the aged bulk-HSC ratio. Based on this data, the authors concluded that age-related changes in the ratio between LT-HSCs and ST-HSCs in bulkHSCs cause myeloid-biased hematopoiesis rather than an increase in myeloid gene expression in the aged bulk-HSCs.

      Comment #1-5: Some data are too weak to fully support their claims. The authors claimed that age-associated extramedullary changes are the main driver of myeloid-biased hematopoiesis based on no major differences in progenitor populations upon transplantation of 10 young HSCs into young or old recipient mice (Figure 7F) and relatively low donor-derived cells in thymus and spleen in aged recipient mice (Figure 7G-J). However, they used selected mice to calculate the progenitor populations in recipient mice (8 out of 17 from young recipients denoted by * and 8 out of 10 from aged recipients denoted by * in Figure 7C). In addition, they calculated the progenitor populations as frequency in c-kit positive cells. Given that they transplanted 10 LT-HSCs into "sub-lethally" irradiated mice and 8.7 Gy irradiation can have different effects on bone marrow clearance in young vs old mice, it is not clear whether this data is reliable enough to support their claims. The same concern applies to the data Figure 7G-J. Authors need to provide alternative data to support their claims.

      Response #1-5: Thank you for useful comments. Our claim regarding Fig. 7 is that age-associated extramedullary changes are merely additional drivers for myeloid-biased hematopoiesis are not the main drivers. But we will address the issues pointed out.

      Regarding the reason for analyzing the asterisk mice

      We performed two independent experiments for Fig. 7. In the first experiment, we planned to analyze the BM of recipients 16 weeks after transplantation. However, as shown in Fig. 7B, many of the aged mice died before 16 weeks. Therefore, we decided to examine the BM of the recipient mice at 12 weeks in the second experiment. Below are the peripheral blood results 11-12 weeks after transplantation for the mice used in the second experiment.

      Author response image 1.

      For the second experiment, we analyzed the BM of all eight all eight aged recipients. Then, we selected the same number of young recipients for analysis to ensure that the donor myeloid output would be comparable to that of the entire young group. Indeed, the donor myeloid lineage output of the selected mice was 28.1 ± 22.9%, closely matching the 23.5 ± 23.3% (p = 0.68) observed in the entire young recipient population. 

      That being said, as the reviewer pointed out, it is considerable that the BM, thymus, and spleen of all mice were not analyzed. Hence, we have added the following sentences:

      [P14, L327] We performed BM analysis for the mice denoted by † in Figure 7C because many of the aged mice had died before the analysis.

      [P15, L338] The thymus and spleen analyses were also performed on the mice denoted by † in Figure 7C.

      Regarding the reason for 8.7 Gy.

      Thank you for your question about whether 8.7 Gy is myeloablative. In our previous report1, we demonstrated that none of the mice subjected to pre-treatment with 8.7 Gy could survive when non-LKS cells were transplanted, suggesting that 8.7 Gy is enough to be myeloablative with the radiation equipment at our facility.

      Author response image 2.

      Response #1-5:

      Reference

      (1)  Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      Regarding the normalization of c-Kit in Figure 7F.  

      Firstly, as shown in Supplemental Figures S1B and S1C, we analyze the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in different panels. Therefore, normalization is required to assess the differentiation of HSCs from upstream to downstream. Additionally, the reason for normalizing by c-Kit+ is that the bone marrow analysis was performed after enrichment using the Anti-c-Kit antibody for both upstream and downstream fractions. Based on this, we calculated the progenitor populations as a frequency within the c-Kit positive cells.

      Next, the results of normalizing the whole bone marrow cells (live cells) are shown below. 

      Author response image 3.

      Similar to the results of normalizing c-Kit+ cells, myeloid progenitors remained unchanged, including a statistically significant decrease in CMP in aged mice. Additionally, there were no significant differences in CLP. In conclusion, we obtained similar results between the normalization with c-Kit and the normalization with whole bone marrow cells (live cells).

      However, as the reviewer pointed out, it is necessary to explain the reason for normalization with c-Kit. Therefore, we will add the following description.

      [P21, L502] For the combined analysis of the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in Figures 1B and 7F, we normalized by c-Kit+ cells because we performed a c-Kit enrichment for the bone marrow analysis.

      Reviewer #2:

      Summary:  

      Nishi et al, investigate the well-known and previously described phenomenon of ageassociated myeloid-biased hematopoiesis. Using a previously established HoxB5mCherry mouse model, they used HoxB5+ and HoxB5- HSCs to discriminate cells with long-term (LTHSCs) and short-term (ST-HSCs) reconstitution potential and compared these populations to immunophenotypically defined 'bulk HSCs' that consists of a mixture of LT-HSC and STHSCs. They then isolated these HSC populations from young and aged mice to test their function and myeloid bias in non-competitive and competitive transplants into young and aged recipients. Based on quantification of hematopoietic cell frequencies in the bone marrow, peripheral blood, and in some experiments the spleen and thymus, the authors argue against the currently held belief that myeloid-biased HSCs expand with age. 

      Comment #2-1: While aspects of their work are fascinating and might have merit, several issues weaken the overall strength of the arguments and interpretation. Multiple experiments were done with a very low number of recipient mice, showed very large standard deviations, and had no statistically detectable difference between experimental groups. While the authors conclude that these experimental groups are not different, the displayed results seem too variable to conclude anything with certainty. The sensitivity of the performed experiments (e.g. Figure 3; Figure 6C, D) is too low to detect even reasonably strong differences between experimental groups and is thus inadequate to support the author's claims. This weakness of the study is not acknowledged in the text and is also not discussed. To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section.

      Response #2-1: Thank you for your important remarks. The power analysis for this experiment shows that power = 0.319, suggesting that more number may be needed. On the other hand, our method for determining the sample size in Figure 3 is as follows:

      (1) First, we checked whether myeloid biased change is detected in the bulk-HSC fraction (Figure S3). The results showed that the difference in myeloid output at 16 weeks after transplantation was statistically significant (young vs. aged = 7.2 ± 8.9 vs. 42.1 ± 35.5%, p = 0.01), even though n = 10.

      (2) Next, myeloid biased HSCs have been reported to be a fraction with high self-renewal ability (2004, Blood). If myeloid biased HSCs increase with aging, the increase in myeloid biased HSCs in LT-HSC fraction would be detected with higher sensitivity than in the bulk-HSC fraction used in Figure S3.

      (3) However, there was no difference not only in p-values but also in the mean itself, young vs aged = 51.4±31.5% vs 47.4±39.0%, p = 0.82, even though n = 8 in Figure 3. Since there was no difference in the mean itself, it is highly likely that no difference will be detected even if n is further increased.

      Regarding Figure 6, we obtained a statistically significant difference and consider the sample size to be sufficient. 

      In addition, we have performed various functional experiments (Figures 2, 5, 6 and S6), and have obtained consistent results that expansion of myeloid biased HSCs does not occur with aging in Hoxb5+HSCs fraction. Based on the above, we conclude that the LT-HSC fraction does not differ in myeloid differentiation potential with aging.

      Comment #2-2: As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided.

      Response #2-2: Thank you for the comments. As the reviewer pointed out, we hope we could reconfirm our results using single-cell level technology in the future.

      On the other hand, we have reported that the ratio of myeloid to lymphoid cells in the peripheral blood changes when the number of HSCs transplanted, or the number of supporting cells transplanted with HSCs, is varied1-2. Therefore, single-cell transplant data need to be interpreted very carefully to determine differentiation potential.

      From this viewpoint, future experiments will combine the Hoxb5 reporter system with a lineage tracing system that can track HSCs at the single-cell level over time. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. We have reflected this comment by adding the following sentences in the manuscript.

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system3-4. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. 

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Sakamaki T, Kao KS, Nishi K, Chen JY, Sadaoka K, Fujii M, et al. Hoxb5 defines the heterogeneity of self-renewal capacity in the hematopoietic stem cell compartment. Biochem Biophys Res Commun [Internet]. 2021;539:34–41. Available from: https://doi.org/10.1016/j.bbrc.2020.12.077

      (3) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (4) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet].

      2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

      Comment #2-3: It is also unclear why the authors believe that the observed reduction of ST-HSCs relative to LT-HSCs explains the myeloid-biased phenotype observed in the peripheral blood. This point seems counterintuitive and requires further explanation.

      Response #2-3: Thank you for your comment. We apologize for the insufficient explanation. Our data, as shown in Figures 3 and 4, demonstrate that the differentiation potential of LT-HSCs remains unchanged with age. Therefore, rather than suggesting that an increase in LT-HSCs with a consistent differentiation capacity leads to myeloid-biased hematopoiesis, it seems more accurate to highlight that the relative decrease in the proportion of ST-HSCs, which remain in peripheral blood as lymphocytes, leads to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis.

      However, if we focus on the increase in the ratio of LT-HSCs, it is also plausible to explain that “with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells becomes relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid-biased hematopoiesis.”

      Comment #2-4: Based on my understanding of the presented data, the authors argue that myeloid-biased HSCs do not exist, as<br /> a) they detect no difference between young/aged HSCs after transplant (mind low n-numbers and large std!); b) myeloid progenitors downstream of HSCs only show minor or no changes in frequency and c) aged LT-HSCs do not outperform young LT-HSC in myeloid output LT-HScs in competitive transplants (mind low n-numbers and large std!).

      Response #2-4: We appreciate the comments. As mentioned above, we will correct the manuscript regarding the sample size.

      Regarding the interpreting of the lack of increase in the percentage of myeloid progenitor cells in the bone marrow with age, it is instead possible that various confounding factors, such as differentiation shortcuts or changes in the microenviroment, are involved.

      However, even when aged LT-HSCs and young LT-HSCs are transplanted into the same recipient mice, the timing of the appearance of different cell fractions in peripheral blood is similar (Figure 3 of this paper). Therefore, we have not obtained data suggesting that clear shortcuts exist in the differentiation process of aged HSCs into neutrophils or monocytes. Additionally, it is currently consensually accepted that myeloid cells, including neutrophils and monocytes, differentiate from GMPs1. Since there is no changes in the proportion of GMPs in the bone marrow with age, we concluded that the differentiation potential into myeloid cells remains consistent with aging.

      Reference

      (1) Akashi K and others, ‘A Clonogenic Common Myeloid Progenitor That Gives Rise to All Myeloid Lineages’, Nature, 404.6774 (2000), 193–97.

      Strengths: 

      The authors present an interesting observation and offer an alternative explanation of the origins of aged-associated myeloid-biased hematopoiesis. Their data regarding the role of the microenvironment in the spleen and thymus appears to be convincing. 

      Weaknesses: 

      Comment #2-5: "Then, we found that the myeloid lineage proportions from young and aged LT-HSCs were nearly comparable during the observation period after transplantation (Figure 3, B and C)."<br /> Given the large standard deviation and low n-numbers, the power of the analysis to detect differences between experimental groups is very low. Experimental groups with too large standard deviations (as displayed here) are difficult to interpret and might be inconclusive. The absence of clearly detectable differences between young and aged transplanted HSCs could thus simply be a false-negative result. The shown experimental results hence do not provide strong evidence for the author's interpretation of the data. The authors should add additional transplants and include a detailed power analysis to be able to detect differences between experimental groups with reasonable sensitivity.

      Response #2-5: Thank you for providing these insights. Regarding the sample size, we have addressed this in Response #2-1.

      Comment #2-6: Line 293: "Based on these findings, we concluded that myeloid-biased hematopoiesis observed following transplantation of aged HSCs was caused by a relative decrease in ST-HSC in the bulk-HSC compartment in aged mice rather than the selective expansion of myeloid-biased HSC clones."<br /> Couldn't that also be explained by an increase in myeloid-biased HSCs, as repeatedly reported and seen in the expansion of CD150+ HSCs? It is not intuitively clear why a reduction of ST-HSCs clones would lead to a myeloid bias. The author should try to explain more clearly where they believe the increased number of myeloid cells comes from. What is the source of myeloid cells if the authors believe they are not derived from the expanded population of myeloid-biased HSCs?

      Response #2-6: Thank you for pointing this out. We apologize for the insufficient explanation. We will explain using Figure 8 from the paper.

      First, our data show that LT-HSCs maintain their differentiation capacity with age, while ST-HSCs lose their self-renewal capacity earlier, so that only long-lived memory lymphocytes remain in the peripheral blood after the loss of self-renewal capacity in ST-HSCs (Figure 8, upper panel). In mouse bone marrow, the proportion of LT-HSCs increases with age, while the proportion of STHSCs relatively decreases (Figure 8, lower panel and Figure S5). 

      Our data show that merely reproducing the ratio of LT-HSCs to ST-HSCs observed in aged mice using young LT-HSCs and ST-HSCs can replicate myeloid-biased hematopoiesis. This suggests that the increase in LT-HSC and the relative decrease in ST-HSC within the HSC compartment with aging are likely to contribute to myeloid-biased hematopoiesis.

      As mentioned earlier, since the differentiation capacity of LT-HSCs remain unchaged with age, it seems more accurate to describe that the relative decrease in the proportion of STHSCs, which retain long-lived memory lymphocytes in peripheral blood, leads to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis.

      However, focusing on the increase in the proportion of LT-HSCs, it is also possible to explain that “with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells becomes relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid-biased hematopoiesis.”

      Reviewer #3:

      Summary:

      In this manuscript, Nishi et al. propose a new model to explain the previously reported myeloid-biased hematopoiesis associated with aging. Traditionally, this phenotype has been explained by the expansion of myeloid-biased hematopoietic stem cell (HSC) clones during aging. Here, the authors question this idea and show how their Hoxb5 reporter model can discriminate long-term (LT) and short-term (ST) HSC and characterized their lineage output after transplant. From these analyses, the authors conclude that changes during aging in the LT/ST HSC proportion explain the myeloid bias observed. 

      Although the topic is appropriate and the new model provides a new way to think about lineage-biased output observed in multiple hematopoietic contexts, some of the experimental design choices, as well as some of the conclusions drawn from the results could be substantially improved. Also, they do not propose any potential mechanism to explain this process, which reduces the potential impact and novelty of the study. Specific concerns are outlined below. 

      Major 

      Comment #3-1: As a general comment, there are experimental details that are either missing or not clear. The main one is related to transplantation assays. What is the irradiation dose? The Methods sections indicates "recipient mice were lethally irradiated with single doses of 8.7 or 9.1 Gy". The only experimental schematic indicating the irradiation dose is Figure 7A, which uses 8.7 Gy. Also, although there is not a "standard", 11 Gy split in two doses is typically considered lethal irradiation, while 9.5 Gy is considered sublethal.

      Response #3-1: We agree with reviewer’s assessment about whether 8.7 Gy is myeloablative. To confirm this, it would typically be necessary to irradiate mice with different dose and observe if they do not survive. However, such an experiment is not ethically permissible at our facility. Instead, in our previous report1, we demonstrated that none of the mice subjected to pretreatment with 8.7 Gy could survive when non-LKS cells were transplanted, suggesting that

      8.7 Gy is enough to be myeloablative with the radiation equipment at our facility.

      Reference

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      Comment #3-2:  Is there any reason for these lower doses? Same question for giving a single dose and for performing irradiation a day before transplant. 

      Response #3-2: We appreciate the reviewer for these important comments. Although the 8.7 Gy dose used at our facility is lower than in other reports, we selected this dose to maintain consistency with our previous experiments. For the same reason, we used a single irradiation, not split.  Regarding the timing of irradiation, the method section specifies that irradiation timing is 12-24 hours prior to transplantation. In most experiments, irradiation is performed at 12 hours. However, due to experimental progress, there were occasional instances where nearly 24 hours elapsed between irradiation and transplantation. We provide this information to ensure accuracy.

      Comment #3-3: The manuscript would benefit from the inclusion of references to recent studies discussing hematopoietic biases and differentiation dynamics at a single-cell level (e.g., Yamamoto et. al 2018; Rodriguez-Fraticelli et al., 2020). Also, when discussing the discrepancy between studies claiming different biases within the HSC pool, the authors mentioned that Montecino-Rodriguez et al. 2019 showed preserved lymphoid potential with age. It would be good to acknowledge that this study used busulfan as the conditioning method instead of irradiation.

      Response #3-3: We agree with this comment and have incorporated this suggestion into the manuscript

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system1-2. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. Additionally, in this report we purified LT-HSCs by Hoxb5 reporter system. In contrast, various LT-HSC markers have been previously reported2-3.  Therefore, it is ideal to validate our findings using other LT-HSC makers.

      [P16, L368] Other studies suggest that blockage of lymphoid hematopoiesis in aged mice results in myeloid-skewed hematopoiesis through alternative mechanisms. However, this result should be interpreted carefully, since Busulfan was used for myeloablative treatment in this study4.   

      References

      (1) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (2) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet]. 2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

      (3) Sanjuan-Pla A, Macaulay IC, Jensen CT, Woll PS, Luis TC, Mead A, et al. Plateletbiased stem cells reside at the apex of the haematopoietic stem-cell hierarchy. Nature.

      2013;502(7470):232–6. 

      (4) Montecino-Rodriguez E, Kong Y, Casero D, Rouault A, Dorshkind K, Pioli PD. Lymphoid-Biased Hematopoietic Stem Cells Are Maintained with Age and Efficiently Generate Lymphoid Progeny. Stem Cell Reports. 2019 Mar 5;12(3):584–96. 

      Comment #3-4: When representing the contribution to PB from transplanted cells, the authors show the % of each lineage within the donor-derived cells (Figures 3B-C, 5B, 6B-D, 7C-E, and S3 B-C). To have a better picture of total donor contribution, total PB and BM chimerism should be included for each transplantation assay. Also, for Figures 2C-D and Figures S2A-B, do the graphs represent 100% of the PB cells? Are there any radioresistant cells?

      Response #3-4: Thank you for highlighting this point. Indeed, donor contribution to total peripheral blood (PB) is important information. We have included the donor contribution data for each figure above mentioned.

      Author response image 4.

      In Figure 2C-D and Figure S2A-B, the percentage of donor chimerism in PB was defined as the percentage of CD45.1-CD45.2+ cells among total CD45.1-CD45.2+ and CD45.1+CD45.2+ cells as described in method section.

      Comment #3-5: For BM progenitor frequencies, the authors present the data as the frequency of cKit+ cells. This normalization might be misleading as changes in the proportion of cKit+ between the different experimental conditions could mask differences in these BM subpopulations. Representing this data as the frequency of BM single cells or as absolute numbers (e.g., per femur) would be valuable.

      Response #3-5: We appreciate the reviewer's comment on this point. 

      Firstly, as shown in Supplemental Figures S1B and S1C, we analyze the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in different panels. Therefore, normalization is required to assess the differentiation of HSCs from upstream to downstream. Additionally, the reason for normalizing by c-Kit+ is that the bone marrow analysis was performed after enrichment using the Anti-c-Kit antibody for both upstream and downstream fractions. Based on this, we calculated the progenitor populations as a frequency within the c-Kit positive cells. Next, the results of normalizing the whole bone marrow cells (live cells) are shown in Author response image 2. 

      Similar to the results of normalizing c-Kit+ cells, myeloid progenitors remained unchanged, including a statistically significant decrease in CMP in aged mice. Additionally, there were no significant differences in CLP. In conclusion, similar results were obtained between the normalization with c-Kit and the normalization with whole bone marrow cells (live cells).

      However, as the reviewer pointed out, it is necessary to explain the reason for normalization with c-Kit. Therefore, we will add the following description.

      [P21, L502] For the combined analysis of the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in Figures 1B and 7F, we normalized by c-Kit+ cells because we performed a c-Kit enrichment for the bone marrow analysis.

      Comment #3-6: Regarding Figure 1B, the authors argue that if myeloid-biased HSC clones increase with age, they should see increased frequency of all components of the myeloid differentiation pathway (CMP, GMP, MEP). This would imply that their results (no changes or reduction in these myeloid subpopulations) suggest the absence of myeloid-biased HSC clones expansion with age. This reviewer believes that differentiation dynamics within the hematopoietic hierarchy can be more complex than a cascade of sequential and compartmentalized events (e.g., accelerated differentiation at the CMP level could cause exhaustion of this compartment and explain its reduction with age and why GMP and MEP are unchanged) and these conclusions should be considered more carefully.

      Response #3-6: We wish to thank the reviewer for this comment. We agree with that the differentiation pathway may not be a cascade of sequential events but could be influenced by various factors such as extrinsic factors.

      In Figure 1B, we hypothesized that there may be other mechanisms causing myeloidbiased hematopoiesis besides the age-related increase in myeloid-biased HSCs, given that the percentage of myeloid progenitor cells in the bone marrow did not change with age. However, we do not discuss the presence or absence of myeloid-biased HSCs based on the data in Figure 1B. 

      Our newly proposed theories—that the differentiation capacity of LT-HSCs remains unchanged with age and that age-related myeloid-biased hematopoiesis is due to changes in the ratio of LT-HSCs to ST-HSCs—are based on functional experiment results. As the reviewer pointed out, to discuss the presence or absence of myeloid-biased HSCs based on the data in Figure 1B, it is necessary to apply a system that can track HSC differentiation at single-cell level. The technology would clarify changes in the self-renewal capacity of individual HSCs and their differentiation into progenitor cells and peripheral blood cells. The authors believe that those single-cell technologies will be beneficial in understanding the differentiation of HSCs. Based on the above, the following statement has been added to the text.

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system1-2. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. 

      References

      (1) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (2) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet]. 2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

      Comment #3-7: Within the few recipients showing good donor engraftment in Figure 2C, there is a big proportion of T cells that are "amplified" upon secondary transplantation (Figure 2D). Is this expected?

      Response #3-7: We wish to express our deep appreciation to the reviewer for insightful comment on this point. As the reviewers pointed out, in Figure 2D, a few recipients show a very high percentage of T cells. The authors had the same question and considered this phenomenon as follows:

      (1) One reason for the very high percentage of T cells is that we used 1 x 107 whole bone marrow cells in the secondary transplantation. Consequently, the donor cells in the secondary transplantation contained more T-cell progenitor cells, leading to a greater increase in T cells compared to the primary transplantation.

      (2) We also consider that this phenomenon may be influenced by the reduced selfrenewal capacity of aged LT-HSCs, resulting in decreased sustained production of myeloid cells in the secondary recipient mice. As a result, long-lived memory-type lymphocytes may preferentially remain in the peripheral blood, increasing the percentage of T cells in the secondary recipient mice.

      We have discussed our hypothesis regarding this interesting phenomenon. To further clarify the characteristics of the increased T-cell count in the secondary recipient mice, we will analyze TCR clonality and diversity in the future.

      Comment #3-8: Do the authors have any explanation for the high level of variability within the recipients of Hoxb5+ cells in Figure 2C?

      Response #3-8: We appreciate the reviewer's comment on this point. As noted in our previous report, transplantation of a sufficient number of HSCs results in stable donor chimerism, whereas a small number of HSCs leads to increased variability in donor chimerism1. Additionally, other studies have observed high variability when fewer than 10 HSCs are transplanted2-3. Based on this evidence, we consider that the transplantation of a small number of cells (10 cells) is the primary cause of the high level of variability observed.

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Dykstra B, Olthof S, Schreuder J, Ritsema M, Haan G De. Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells. J Exp Med. 2011 Dec 19;208(13):2691–703. 

      (3) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      Comment #3-9: Can the results from Figure 2E be interpreted as Hoxb5+ cells having a myeloid bias? (differences are more obvious/significant in neutrophils and monocytes).

      Response #3-9: Thank you for your insightful comments. Firstly, we have not obtained any data indicating that young LT-HSCs are myeloid biased HSCs so far. Therefore, we classify young LT-HSCs as balanced HSCs1. Secondly, our current data demonstrate no significant difference in differentiation capacity between young and aged LT-HSCs (see Figure 3 in this paper). Based on these findings, we interpret that aged LT-HSCs are balanced HSCs, similar to young LT-HSCs.

      Reference

      (1)  Chen JY, Miyanishi M, Wang SK, Yamazaki S, Sinha R, Kao KS, et al. Hoxb5 marks long-term haematopoietic stem cells and reveals a homogenous perivascular niche. Nature. 2016 Feb 10;530(7589):223–7. 

      Comment #3-10: Is Figure 2G considering all primary recipients or only the ones that were used for secondary transplants? The second option would be a fairer comparison.

      Response #3-10: We appreciate the reviewer's comment on this point. We considered all primary recipients in Figure 2G to ensure a fair comparison, given the influence of various factors such as the radiosensitivity of individual recipient mice1. Comparing only the primary recipients used in the secondary transplantation would result in n = 3 (primary recipient) vs. n = 12 (secondary recipient). Including all primary recipients yields n = 11 vs. n = 12, providing a more balanced comparison. Therefore, we analyzed all primary recipient mice to ensure the reliability of our results.

      Reference

      (1) Duran-Struuck R, Dysko RC. Principles of bone marrow transplantation (BMT): providing optimal veterinary and husbandry care to irradiated mice in BMT studies. J Am Assoc Lab Anim Sci. 2009; 48:11–22

      Comment #3-11: When discussing the transcriptional profile of young and aged HSCs, the authors claim that genes linked to myeloid differentiation remain unchanged in the LT-HSC fraction while there are significant changes in the ST-HSCs. However, 2 out of the 4 genes shown in Figure S4B show ratios higher than 1 in LT-HSCs.

      Response #3-11: Thank you for highlighting this important point. As the reviewer pointed out, when we analyze the expression of myeloid-related genes, some genes are elevated in aged LT-HSCs compared to young LT-HSCs. However, the GSEA analysis using myeloid-related gene sets, which include several hundred genes, shows no significant difference between young and aged LT-HSCs (see Figure S4C in this paper). Furthermore, functional experiments using the co-transplantation system show no difference in differentiation capacity between young and aged LT-HSCs (see Figure 3 in this paper). Based on these results, we conclude that LT-HSCs do not exhibit any change in differentiation capacity with aging.

      Comment #3-12: When determining the lymphoid bias in ST-HSCs, the authors focus on the T-cell subtype, not considering any other any other lymphoid population. Could the authors explain this?

      Response #3-12: We thank the reviewer for this comment. We conducted the experiments in Figure 5 to demonstrate that the hematopoiesis observed 16 weeks post-transplantation—when STHSCs are believed to lose their self-renewal capacity—is not due to de novo production of T cells from ST-HSCs. Instead, it is attributed to long-lived memory cells which can persistently remain in the peripheral blood.

      As noted by the reviewer, various memory cell types are present in peripheral blood. Our analysis focused on memory T cells due to the broad consensus on memory T cell markers1. 

      Our findings show that transplanted Hoxb5- HSCs do not continuously produce lymphoid cells, unlike lymphoid-biased HSCs. Rather, the loss of self-renewal capacity in Hoxb5- HSCs makes the presence of long-lived memory cells in the peripheral blood more apparent.

      Reference

      (1)  Yenyuwadee S, Sanchez-Trincado Lopez JL, Shah R, Rosato PC, Boussiotis VA. The evolving role of tissue-resident memory T cells in infections and cancer. Sci Adv. 2022;8(33). 

      Comment #3-13: Based on the reduced frequency of donor cells in the spleen and thymus, the authors conclude "the process of lymphoid lineage differentiation was impaired in the spleens and thymi of aged mice compared to young mice". An alternative explanation could be that differentiated cells do not successfully migrate from the bone marrow to these secondary lymphoid organs. Please consider this possibility when discussing the data.

      Response #3-13: We strongly appreciate the reviewer's comment on this point. In accordance with the reviewer's comment, we have incorporated this suggestion into our manuscript.

      [P15, L343] These results indicate that the process of lymphoid lineage differentiation is impaired in the spleens and thymi of aged mice compared to young mice, or that differentiating cells in the bone marrow do not successfully migrate into these secondary lymphoid organs. These factors contribute to the enhanced myeloid-biased hematopoiesis in peripheral blood due to a decrease in de novo lymphocyte production.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Recommendation #2-1: To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section.

      Response to Recommendation #2-1: Thank you for your important remarks. The power analysis for this experiment shows that power = 0.319, suggesting that more number may be needed. On the other hand, our method for determining the sample size in Figure 3 is as follows:

      (1) First, we checked whether myeloid biased change is detected in the bulk-HSC fraction (Figure S3). The results showed that the difference in myeloid output at 16 weeks after transplantation was statistically significant (young vs. aged = 7.2 ± 8.9 vs. 42.1 ± 35.5%, p = 0.01), even though n = 10.

      (2) Next, myeloid biased HSCs have been reported to be a fraction with high self-renewal ability (2004, Blood). If myeloid biased HSCs increase with aging, the increase in myeloid biased HSCs in LT-HSC fraction would be detected with higher sensitivity than in the bulk-HSC fraction used in Figure S3.

      (3) However, there was no difference not only in p-values but also in the mean itself, young vs aged = 51.4±31.5% vs 47.4±39.0%, p = 0.82, even though n = 8 in Figure 3. Since there was no difference in the mean itself, it is highly likely that no difference will be detected even if n is further increased.

      Regarding Figure S3, 5, 6, S6 and 7, we obtained a statistically significant difference and consider the sample size to be sufficient. 

      Recommendation #2-2: As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided.

      Response to Recommendation #2-2: Thank you for the comments. As the reviewer pointed out, we hope we could reconfirm our results using single-cell level technology in the future.

      On the other hand, we have reported that the ratio of myeloid to lymphoid cells in the peripheral blood changes when the number of HSCs transplanted, or the number of supporting cells transplanted with HSCs, is varied1-2. Therefore, single-cell transplant data need to be interpreted very carefully to determine differentiation potential.

      From this viewpoint, future experiments will combine the Hoxb5 reporter system with a lineage tracing system that can track HSCs at the single-cell level over time. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. We have reflected this comment by adding the following sentences in the manuscript.

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty transplantation assays. Therefore, the current theory should be revalidated using single-cell technology. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells.

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Sakamaki T, Kao KS, Nishi K, Chen JY, Sadaoka K, Fujii M, et al. Hoxb5 defines the heterogeneity of self-renewal capacity in the hematopoietic stem cell compartment. Biochem Biophys Res Commun [Internet]. 2021;539:34–41. Available from: https://doi.org/10.1016/j.bbrc.2020.12.077

      Minor points:

      Recommendation #2-3: Figure 1: "Comprehensive analysis of hematopoietic alternations with age shows a discrepancy of age-associated changes between peripheral blood and bone marrow"

      [Comment to the authors]: For clarity, the nature of the discrepancy should be stated clearly.

      Response to Recommendation #2-3: Thank you for this important comment. Following the reviewer’s recommendation, we have revised the manuscript as follows

      [P7, L139] Our analysis of hematopoietic alternations with age revealed that age-associated transition patterns of immunophenotypically defined HSC and CMP in BM were not paralleled with myeloid cell in PB (Fig. 1 C).

      Recommendation #2-4: Figure 1B "(B) Average frequency of immunophenotypically defined HSC and progenitor cells in BM of 2-3-month mice (n = 6), 6-month mice (n = 6), 12-13-month mice (n = 6), {greater than or equal to} 23-month mice (n = 7).

      [Comment to the authors]: It should be stated in the figure and legend that the values are normalized to the 2-3-month-old mice.

      Response to Recommendation #2-4: Thank you for this comment. Figure 1B presents the actual measured values of each fraction in c-Kit positive cells in the bone marrow, without any normalization.

      Recommendation #2-5: "We 127 found that the frequency of immunophenotypically defined HSC in BM rapidly increased 128 up to the age of 12 months. After the age, they remained plateaued throughout the 129 observation period (Fig. 1 B)."

      [Comment to the authors]: The evidence for a 'plateau', where HSC numbers don't change after 12 months is weak. It appears that the numbers increase continuously (although less steep) after 12 months. I thus recommend adjusting the wording to better reflect the data.

      Response to Recommendation #2-5: We thank the reviewer for the comments above and have incorporated these suggestions in our revision as follows. 

      [P6, L126] We found that the frequency of immunophenotypically defined HSC in BM rapidly increased up to the age of 12 months. After the age, the rate of increase in their frequency appeared to slow down.

      Recommendation #2-6: Figure 2G: [Comment to the authors]: Please add the required statistics, please check carefully all figures for missing statistical tests.

      Response to Recommendation #2-6: Thank you for these important comments. In response, we have added the results of the significance tests for Figures 1A, 1C, 4C, and S5.

      Recommendation #2-7: "If bulk-HSCs isolated from aged mice are already enriched by myeloid-biased HSC clones, we should see more myeloid-biased phenotypes 16 weeks after primary and the secondary transplantation. However, we found that kinetics of the proportion of myeloid cells in PB were similar across primary and the secondary transplantation and that the proportion of myeloid cells gradually decreased over time (Fig. 2 G). These results suggest the following two possibilities: either myeloid-biased HSCs do not expand in the LT-HSC fraction, or the expansion of myeloid-biased clones in 2-year-old mice has already peaked."

      [Comment to the authors]: Other possible explanations include that the observed reduction in myeloid reconstitution over 16 weeks reflects the time required to return to homeostasis. In other words, it takes time until the blood system approaches a balanced output.

      Response to Recommendation #2-7: We agree with the reviewer's comment. As the reviewer pointed out, the gradual decrease in the proportion of myeloid cells over time is not related to our two hypotheses in this part of the manuscript but rather to the hematopoietic system's process of returning to a homeostatic state after transplantation. Therefore, the original sentence could be misleading, as it is part of the section discussing whether age-associated expansion of myeloid-biased HSCs is observed. Based on the above, we have revised the sentence as follows.

      [P8, L179] However, we found that kinetics of the proportion of myeloid cells in PB were similar across the primary and the secondary transplantation (Fig. 2 G). These results suggest the following two possibilities: either myeloid-biased HSCs do not expand in the LTHSC fraction, or the expansion of myeloid-biased clones in 2-year-old mice has already peaked.

      Recommendation #2-8: It is also important to consider that the transplant results are highly variable (see large standard deviation), therefore the sensitivity to detect smaller but relevant changes is low in the shown experiments. As the statistical analysis of these experiments is missing and the power seems low these results should be interpreted with caution. For instance, it appears that the secondary transplants on average produce more myeloid cells as expected and predicted by the classical clonal expansion model.

      Regarding "expansion of myeloid-biased clones in 2-year-old mice has already peaked". This is what the author suggested above. It might thus not be surprising that HSCs from 2-year-old mice show little to no increased myeloid expansion.

      Response to Recommendation #2-8: Thank you for providing these insights. The primary findings of our study are based on functional experiments presented in Figures 2, 3, 5, 6, and 7. In Figure 3, there was no significant difference between young and aged LT-HSCs, with mean values of 51.4±31.5% and 47.4±39.0%, respectively (p = 0.82). Given the lack of difference in the mean values, it is unlikely that increasing the sample size would reveal a significant change. For ethical reasons, to minimize the use of additional animals, we conclude that LT-HSCs exhibit no change in lineage output throughout life based on the data in Figure 3. Statistically significant differences observed in Figures 2, 5, 6, and 7 further support our conclusions.

      Additionally, because whole bone marrow cells were transplanted in the secondary transplantation, there may be various confounding factors beyond the differentiation potential of HSCs. Therefore, we consider that caution is necessary when evaluating the differentiation capacity of HSCs in the context of the second transplantation.

      Recommendation #2-9: Figure 7C: [Comment to the authors]: The star * indicates with analyzed BM. As stars are typically used as indicators of significance, this can be confusing for the reader. I thus suggest using another symbol.

      Response to Recommendation #2-9: We appreciate the reviewer for this comment and have incorporated the suggestion in the revised manuscript. We have decided to use † instead of the star*.

      Reviewer #3 (Recommendations For The Authors):

      Recommendation #3.1: In Figure 1A, the authors show the frequency of PB lineages (lymphoid vs myeloid) in mice of different ages. It would be great if they could show the same data for each subpopulation including these two main categories individually (granulocytes, monocytes, B cells, T cells...).

      Response to Recommendation #3-1: We thank for this suggestion. We provide the frequency of PB lineages (granulocytes, monocytes, B cells, T cells, and NK cells) in mice of different ages.

      Author response image 5.

      Average frequency of neutrophils, monocytes, B cells, T cells, and NK cells in PB analyzed in Figure 1A. Dots show all individual mice. *P < 0.05. **P < 0.01. Data and error bars represent means ± standard deviation. 

      Recommendation #3.2: It would be great if data from young mice could be shown in parallel to the graphs in Figure 2A.

      Response to Recommendation #3-2: We thank the reviewer for the comments above and have incorporated these suggestions in Figure 2A. 

      [P34, L916] (A) Hoxb5 reporter expression in bulk-HSC, MPP, Flk2+, and Lin-Sca1-c-Kit+ populations in the 2-year-old Hoxb5-tri-mCherry mice (Upper panel) and 3-month-old Hoxb5_tri-mCherry mice (Lower panel). Values indicate the percentage of mCherry+ cells ± standard deviation in each fraction (_n = 3). 

      Recommendation #3.3: Do the authors have any explanation for the high level of variability within the recipients of Hoxb5+ cells in Figure 2C?

      Response to Recommendation #3-3: Thank you for providing these insights. As noted in our previous report, transplantation of a sufficient number of HSCs results in stable donor chimerism, whereas a small number of HSCs leads to increased variability in donor chimerism1. Additionally, other studies have observed high variability when fewer than 10 HSCs are transplanted2-3. Based on this evidence, we consider that the transplantation of a small number of cells (10 cells) is the primary cause of the high level of variability observed.

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Dykstra B, Olthof S, Schreuder J, Ritsema M, Haan G De. Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells. J Exp Med. 2011 Dec 19;208(13):2691–703. 

      (3) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      Recommendation #3.4: Are the differences in Figure 3D statistically significant? If yes, please add statistics. Same for Figure 4C.

      Response to Recommendation #3-4: Thank you for providing these insights. For Figure 3D, we performed an ANOVA analysis for each fraction; however, the results were not statistically significant. In contrast, for Figure 4C, we have added the results of significance tests for comparisons between Young LT-HSC vs. Young Bulk-HSC.

      Recommendation #3.5: As a general comment, although the results in this study are interesting, the use of a Hoxb5 lineage tracing mouse model would be more valuable for this purpose than the Hoxb5 reporter used here. The lineage tracing model would allow for the assessment of lineage bias without the caveats introduced by the transplantation assays.

      Response to Recommendation #3-5: We appreciate the reviewer for the important comments. Following the reviewer’s recommendation, we have revised the manuscript as follows

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system1-2. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. 

      References

      (1) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (2) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet]. 2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

    2. eLife Assessment

      The manuscript provides useful findings to explore the heterogeneity of hematopoietic stem cells and age-related myeloid-biased hematopoiesis. The results presented in this study are incomplete and additional data are necessary to bolster the conclusions. Certain aspects of the methods, experimental design, and data analyses remain inadequate and only partially support the central claims.

    3. Reviewer #1 (Public review):

      Summary

      In this study, Nishi et al. claim that the ratio of long-term hematopoietic stem cell (LT-HSC) versus short-term HSC (ST-HSC) determines the lineage output of HSCs and reduced ratio of ST-HSC in aged mice causes myeloid-biased hematopoiesis. Authors used Hoxb5 reporter mice to isolated LT-HSC and ST-HSC and performed molecular analyses and transplantation assays to support their arguments. How hematopoietic system becomes myeloid-biased upon aging is an important question with many implications in disease context as well. However, this study needs more definitive data.

      (1) Authors' experimental designs have some caveats to definitely support their claims. Authors claimed that aged LT-HSCs have no myeloid-biased clone expansion using transplantation assays. In these experiments, authors used 10 HSCs and young mice as recipients. Given the huge expansion of old HSC by number and known heterogeneity in immunophenotypically defined HSC populations, it is questionable how 10 out of so many old HSCs (an average of 300,000 up to 500,000 cells per mouse; Mitchell et al., Nature Cell Biology, 2023) can faithfully represent old HSC population. The Hoxb5+ old HSC primary and secondary recipient mice data (Fig. 2C and D) support this concern. In addition, they only used young recipients. Considering the importance of inflammatory aged niche in the myeloid-biased lineage output, transplanting young vs old LT-HSCs into aged mice will complete the whole picture.

      (2) Authors' molecular data analyses need more rigor with unbiased approaches. They claimed that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid or lymphoid gene set enrichment but aged bulk HSCs, which are just a sum of LT-HSCs and ST-HSCs by their gating scheme (Fig. 4A), showed the "tendency" of enrichment of myeloid-related genes based on the selected gene set (Fig. 4D). Although the proportion of ST-HSCs is reduced in bulk HSCs upon aging, since ST-HSCs do not exhibit lymphoid gene set enrichment based on their data, it is hard to understand how aged bulk HSCs have more myeloid gene set enrichment compared to young bulk HSCs. This bulk HSC data rather suggest that there could be a trend toward certain lineage bias (although not significant) in aged LT-HSCs or ST-HSCs. Authors need to verify the molecular lineage priming of LT-HSCs and ST-HSCs using another comprehensive dataset.

      (3) Although authors could not find any molecular evidence for myeloid-biased hematopoiesis from old HSCs (either LT or ST), they argued that the ratio between LT-HSC and ST-HSC causes myeloid-biased hematopoiesis upon aging based on young HSC experiments (Fig. 6). However, old ST-HSC functional data showed that they barely contribute to blood production unlike young Hoxb5- HSCs (ST-HSC) in the transplantation setting (Fig. 2). Is there any evidence that in unperturbed native old hematopoiesis, old Hoxb5- HSCs (ST-HSC) still contribute to blood production? If so, what are their lineage potential/output? Without this information, it is hard to argue that the different ratio causes myeloid-biased hematopoiesis in aging context.

    4. Reviewer #2 (Public review):

      Summary:

      Nishi et al, investigate the well-known and previously described phenomenon of age-associated myeloid-biased hematopoiesis. Using a previously established HoxB5mCherry mouse model, they used HoxB5+ and HoxB5- HSCs to discriminate cells with long-term (LT-HSCs) and short-term (ST-HSCs) reconstitution potential and compared these populations to immunophenotypically defined 'bulk HSCs' that consists of a mixture of LT-HSC and ST-HSCs. They then isolated these HSC populations from young and aged mice to test their function and myeloid bias in non-competitive and competitive transplants into young and aged recipients. Based on quantification of hematopoietic cell frequencies in the bone marrow, peripheral blood, and in some experiments the spleen and thymus, the authors argue against the currently held belief that myeloid-biased HSCs expand with age.

      While aspects of their work are fascinating and might have merit, several issues weaken the overall strength of the arguments and interpretation. Multiple experiments were done with a very low number of recipient mice, showed very large standard deviations, and had no statistically detectable difference between experimental groups. While the authors conclude that these experimental groups are not different, the displayed results seem too variable to conclude anything with certainty. The sensitivity of the performed experiments (e.g. Fig 3; Fig 6C, D) is too low to detect even reasonably strong differences between experimental groups and is thus inadequate to support the author's claims. This weakness of the study is not acknowledged in the text and is also not discussed. To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section.

      As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided.

      It is also unclear why the authors believe that the observed reduction of ST-HSCs relative to LT-HSCs explains the myeloid-biased phenotype observed in the peripheral blood. This point seems counterintuitive and requires further explanation.

      Based on my understanding of the presented data, the authors argue that myeloid-biased HSCs do not exist, as<br /> a) they detect no difference between young/aged HSCs after transplant (mind low n-numbers and large std!!!); b) myeloid progenitors downstream of HSCs only show minor or no changes in frequency and c) aged LT-HSCs do not outperform young LT-HSC in myeloid output LT-HScs in competitive transplants (mind low n-numbers and large std!!!).<br /> However, given the low n-numbers and high variance of the results, the argument seems weak and the presented data does not support the claims sufficiently. That the number of downstream progenitors does not change could be explained by other mechanisms, for instance, the frequently reported differentiation short-cuts of HSCs and/or changes in the microenvironment.

      Strengths:

      The authors present an interesting observation and offer an alternative explanation of the origins of aged-associated myeloid-biased hematopoiesis. Their data regarding the role of the microenvironment in the spleen and thymus appears to be convincing.

      Weaknesses:

      "Then, we found that the myeloid lineage proportions from young and aged LT-HSCs were nearly comparable during the observation period after transplantation (Fig. 3, B and C)."<br /> [Comment to the authors]: Given the large standard deviation and low n-numbers, the power of the analysis to detect differences between experimental groups is very low. Experimental groups with too large standard deviations (as displayed here) are difficult to interpret and might be inconclusive. The absence of clearly detectable differences between young and aged transplanted HSCs could thus simply be a false-negative result. The shown experimental results hence do not provide strong evidence for the author's interpretation of the data. The authors should add additional transplants and include a detailed power analysis to be able to detect differences between experimental groups with reasonable sensitivity.

      Line 293: "Based on these findings, we concluded that myeloid-biased hematopoiesis observed following transplantation of aged HSCs was caused by a relative decrease in ST-HSC in the bulk-HSC compartment in aged mice rather than the selective expansion of myeloid-biased HSC clones."<br /> Couldn't that also be explained by an increase in myeloid-biased HSCs, as repeatedly reported and seen in the expansion of CD150+ HSCs? It is not intuitively clear why a reduction of ST-HSCs clones would lead to a myeloid bias. The author should try to explain more clearly where they believe the increased number of myeloid cells comes from. What is the source of myeloid cells if the authors believe they are not derived from the expanded population of myeloid-biased HSCs?

    5. Reviewer #3 (Public review):

      In this manuscript, Nishi et al. propose a new model to explain the previously reported myeloid-biased hematopoiesis associated with aging. Traditionally, this phenotype has been explained by the expansion of myeloid-biased hematopoietic stem cell (HSC) clones during aging. Here, the authors question this idea and show how their Hoxb5 reporter model can discriminate long-term (LT) and short-term (ST) HSC and characterized their lineage output after transplant. From these analyses, the authors conclude that changes during aging in the LT/ST HSC proportion explain the myeloid bias observed.

      Although the topic is appropriate and the new model provides a new way to think about lineage-biased output observed in multiple hematopoietic contexts, some of the experimental design choices, as well as some of the conclusions drawn from the results could be substantially improved. Also, they do not propose any potential mechanism to explain this process, which reduces the potential impact and novelty of the study.

      The authors have satisfactorily replied to some of my comments. However, there are multiple key aspects that still remain unresolved.

    1. eLife Assessment

      This valuable study investigates both online responses to, and offline replay of, visual motion sequences. Sophisticated EEG analyses provide solid evidence for both feature-specific and non-specific sequence representations, though the explanation of the statistical methods used is currently incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The study identifies two types of activation: one that is cue-triggered and non-specific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.

      Strengths:

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures, and analyses are solid. The findings are interesting and novel.

      Weaknesses:

      It was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified.

    3. Reviewer #2 (Public review):

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (that is moving dots in one direction, followed by another direction, etc.), showing either the starting movement direction or ending movement direction causes a coarse-grained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies.

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. However, this is the main problem with this paper. The statistical analysis is not explained well at all, and therefore its validity is hard to evaluate. I am not at all saying it is incorrect; what I am saying is that given how it is explained, it cannot be evaluated.

    1. eLife Assessment

      This important study advances our understanding of the role of dopamine in modulating pair bonding in mandarin voles by examining dopamine signaling within the nucleus accumbens across various social stimuli using state-of-the-art causal perturbations. The evidence supporting the findings is compelling, particularly cutting-edge approaches for measuring dopamine release as well as the activity of dopamine receptor populations during social bonding. However, statistical analyses were found to lack rigor and clarity, and the lack of complementary experiments in females was noted as a weakness. Additionally, the manuscript would be strengthened by placing findings within a broader framework, such as by highlighting similarities and/or differences between mandarin and prairie voles.

    2. Reviewer #1 (Public review):

      These experiments are some of the first to assess the role of dopamine release and the activity of D1 and D2 MSNs in pair bond formation in Mandarin voles. This is a novel and comprehensive study that presents exciting data about how the dopamine system is involved in pair bonding. The authors provide very detailed methods and clearly presented results. Here they show dopamine release in the NAc shell is enhanced when male voles encounter their pair bonded partner 7 days after co-habitation. In addition, D2 MSN activity decreases whereas D1 MSN activity increases when sniffing the pair-bonded partner.

      The authors do not provide justification for why they only use males in the current study, without discussing sex as a biological variable these data can only inform readers about one sex (which in pair-bonded animals by definition have 2 sexes). In addition, the authors do not use an isosbestic control wavelength in photometry experiments, although they do use EGFP control mice which show no effects of these interventions, a within-subject control such as an isosbestic excitation wavelength could give more confidence in these data and rule out motion artefacts within subjects.

      There is an existing literature (cited in this manuscript) from Aragona et al., (particularly Aragona et al., 2006) which has highlighted key differences in the roles of rostral versus caudal NAc shell dopamine in pair bond formation and maintenance. Specifically, they report that dopamine transmission promoting pair bonding only occurs in the rostral shell and not the caudal shell or core regions. Given that the authors have targeted more caudally a discussion of how these results fit with previous work and why there may be differences in these areas is warranted.

      The authors could discuss the differences between pair bond formation and pair bond maintenance more deeply.

      The authors have successfully characterised the involvement of dopamine release, changes in D1 and D2 MSNs, and projections to the VP in pair bonding voles. Their conclusions are supported by their data and they make a number of very reasonable discussion points acknowledging various limitations.

    3. Reviewer #2 (Public review):

      Summary:

      Using in vivo fiber-photometry the authors first establish that DA release when contacting their partner mouse increases with days of cohabitation while this increase is not observed when contacting a stranger mouse. Similar effects are found in D1-MSNs and D2-MSNs with the D1-MSN responses increasing and D2-MSN responses decreasing with days of cohabitation. They then use slice physiology to identify underlying plasticity/adaptation mechanisms that could contribute to the changes in D1/D2-MSN responses. Last, to address causality the authors use chemogenetic tools to selectively inhibit or activate NAc shell D1 or D2 neurons that project to the ventral pallidum. They found that D2 inhibition facilitates bond formation while D2 excitation inhibits bond formation. In contrast, both D1-MSN activation and inhibition inhibit bond formation.

      Strengths:

      The strength of the manuscript lies in combining in vivo physiology to demonstrate circuit engagement and chemogenetic manipulation studies to address circuit involvement in pair bond formation in a monogamous vole.

      Weaknesses:

      Weaknesses include that a large set of experiments within the manuscript are dependent on using short promoters for D1 and D2 receptors in viral vectors. As the authors acknowledge this approach can lead to ectopic expression and the presented immunohistochemistry supports this notion. It seems to me that the presented quantification underestimates the degree of ectopic expression that is observed by eye when looking at the presented immunohistochemistry. However, given that Cre transgenic animals are not available for Microtus mandarinus and given the distinct physiological and behavioral outcomes when imaging and manipulating both viral-targeted populations this concern is minor.

      The slice physiology experiments provide some interesting outcomes but it is unclear how they can be linked to the in vivo physiological outcomes and some of the outcomes don't match intuitively (e.g. cohabitation enhances excitatory/inhibitory balance in D2-MSNs but the degree of contact-induced inhibition is enhanced in D2-MSN).

      One interesting finding is that the relationship between D2-MSN and pair bond formation is quite clear (inhibition facilitates while excitation inhibits pair bond formation). In contrast, the role of D1-MSNs is more complicated since both excitation and inhibition disrupt pair bond formation. This is not convincingly discussed.

      It seemed a missed opportunity that physiological readout is limited to males. I understand though that adding females may be beyond the scope of this manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript is evaluating changes in dopamine signaling in the nucleus accumbens following pair bonding and exposure to various stimuli in mandarin voles. In addition, the authors present chemogenetic data that demonstrate excitation and inhibition of D1 and D2 MSN affect pair bond formation.

      Strengths:

      The experimental designs are strong. The approaches are innovative and use cutting-edge methods. The manuscript is well written.

      Weaknesses:

      The statistical results are not presented, and not all statistical analyses are appropriate. Additionally, some details of methods are absent.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Overall, this work is quite comprehensive and is logically and rigorously designed. The phenotypic and functional data on 2C are strong.

      Thank you for your positive feedback on our findings!

      (1) Comment from Reviewer 1 suggesting the mechanistic insights of 2C are primarily derived from transcriptomic and genomic datasets without experimental verification. 

      Thank you for emphasizing the importance of experimental validation to support our transcriptomic and genomic findings. We acknowledge the gap in direct experimental evidence for the mechanistic insights of section 2C and recognize the value of such validation in strengthening our conclusions. While we recognize the importance of such validation, our current dataset lacks the comprehensive preliminary results necessary for inclusion in the supplemental material. We believe that the mechanistic insights presented offer a substantial foundation for the future research, where we aim to explore these aspects in depth with targeted experimental approaches.

      Reviewer 2

      Together their data may suggest a regenerative effect of 2C both in vitro and in vivo settings. If confirmed, this study might unlock therapeutic strategy for cardiac regeneration.

      Thank you for your positive comment on the significance of our findings and the valuable therapeutic potential of 2C in cardiac regeneration!

      (1) Comment from Reviewer 2 pointing out the the main hypothesis (line 50) that Isl1 cells have regenerative properties is not extremely novel. 

      We agree with the reviewer that Isl1-positive cells possess regenerative properties. Following the reviewer’s suggestion, we have revised the original wording (line 46 in the revised manuscript).

      (2) Comment from Reviewer 2 asking for providing a rationale for this 20x reduction of A-485 concentration? It would be useful to get a titration of this compound for the effects tested. 

      As suggested by the reviewer, we have added the titration results of A-485 in Figure 1—figure supplement 1F-G.

      (3) Comment from Reviewer 2 confusing to clearly understand what proportion of CMs dedifferentiate to become RCCs. The lineage tracing data suggests only 0.6%-1.5% of cells undergo this transition. It is difficult to understand how such a small fraction can have wide effects in their different experimental settings. This is specifically true when the author quantified nuclear and cytosolic area on brightfield pictures - would the same effect on nuclear/cytosolic area be observed in Isl1 KO cells. 

      We appreciate the reviewer's insightful observation on the proportion of CMs undergoing dedifferentiation into RCCs and the potential impact of this subset on our experimental outcomes. The lineage tracing data indicating that only 0.6%-1.5% of CMs transition to RCCs indeed reflects a modest proportion. This observation raises valid questions regarding the broader implications of such a limited fraction in the context of cardiac regeneration and the experimental effects reported. It's important to note that while the proportion of CMs dedifferentiating into RCCs is small, the biological significance and potential impact of these RCCs could be disproportionately large. Emerging evidence suggests that even a minimal number of stem or progenitor cells can exert significant effects on tissue repair and regeneration, possibly through paracrine mechanisms or by acting as key signaling centers within the tissue microenvironment (Fernandes et al., 2015). Regarding the specific question about 2C’s effects on nuclear/cytosolic area in Isl1 knockout (KO) cells, we appreciate the suggestion and consider that such comparative studies would provide valuable insights for future comprehensively understanding the significant impact of 2C-induced RCCs in future search. In addition, ISL1 KO cells are also described in detail in the article published in eLife in 2018 by Quaranta et al.

      (4) Comment from Reviewer 2 asking for the effect of CHIR + I-BET-762 alone. 

      As suggested by the reviewer, we have added the results of CHIR + T-BET-762 in Figure 1—figure supplement 1H.

      (5) Comment from Reviewer 2 suggesting a transparent explaination about the effects of A-485 on acetylation status.

      We thank the reviewer for highlighting the confusion regarding the effects of A-485 on the acetylation status of H3K27Ac and H3K9Ac. Upon re-examination of our data and statements, we recognize the need for clarity in our explanation and the inconsistency it may have caused (lines 223-231 on page 8).

      Initially, our observations suggested a selective effect of A-485 on H3K27Ac based on early experimental results (Figure 7—figure supplement 1). This conclusion was drawn from preliminary analyses that focused predominantly on this specific histone mark. However, upon further comprehensive examination of our data, including additional replicates and more sensitive detection methods, we observed that A-485 also impacts H3K9Ac levels (Figure 7—figure supplement 1F). This latter finding emerged from expanded datasets that were not initially considered in our preliminary conclusions.

      The "further analyses" mentioned referred to these subsequent experimental investigations, which included chromatin immunoprecipitation (ChIP) assays and extended sample sizes, providing a more robust dataset for evaluating the effects of A-485. We understand the importance of transparency and rigor in scientific communication. To address this, we have revised the manuscript to clearly delineate the progression of our analyses and the evidential basis for our revised understanding of A-485's effects. This includes a detailed description of the methodologies employed in our follow-up experiments (line 537 on page 27), the statistical approaches for data analysis (lines 226-227 in supporting information), and how these led to the updated interpretation regarding A-485's impact on histone acetylation (lines232-269).

      (6) Comment from Reviewer 2 asking for the difference in the ChIP peaks representation of the y-axis on the ChIP traces.

      Thank you for raising this quest. Actually, we did not normalise the sequencing depth and the y-axis represents the number of counts (line 537 on page 27 and lines 226-227 in supporting information).

      (7) Comment from Reviewer 2 suggesting the possibility of testing this 2C protocol on mESCs to see if similar changes are subject to and how these mouse RCCs differ transcriptionally from Isl1+ progenitor cells isolated from neonatal mice (P1-P5)?

      Thank you for your insightful questions. Testing the 2C protocol on mouse embryonic stem cells (mESCs) to observe if similar changes occur presents an excellent opportunity to further validate the versatility and applicability of our findings across different stem cell models. We agree that such experiments would not only strengthen the current study but also provide valuable insights into the conservation of mechanisms across species. We are currently in the process of setting up experiments to address this very question and anticipate that the results will significantly contribute to our understanding of cardiomyocyte differentiation processes. Regarding the transcriptional comparison between mouse regenerative cardiac cells (RCCs) induced by our 2C protocol and Isl1+ progenitors isolated from neonatal mice (P1-P5), this comparison is indeed crucial for delineating the specific identity and developmental potential of the RCCs generated. However, a comprehensive side-by-side transcriptomic analysis is required to systematically identify these differences and understand their biological implications. We plan to undertake this analysis as part of our future studies, which will include detailed RNA sequencing and comparative gene expression profiling to elucidate the transcriptional similarities and differences between these cell populations. These future directions will enhance our current findings, provide a deeper mechanistic understanding, and confirm the potential of the 2C protocol in regenerative medicine applications. We appreciate the reviewer's suggestions and acknowledge the importance of these experiments in advancing the field.

      (8) Comment from Reviewer 2 with a suggestion to have a precise clarification of statistics & data acquisition.

      As suggested by the reviewer, we have revised clarifications to make them clearer (lines 228-233 in supporting information and a precise description of each paragraph involving statistical analyses).

      Reviewer 3

      The findings may have a translation potential. The idea of promoting the regenerative capacity of the heart by reprogramming CMs into RCCs is interesting.

      Thank you for your appreciation of the significance and translational potential of our findings!

      (1) Comment from Reviewer 3 suggesting the mechanism involved in the 2C-mediated generation of RCCs is unclear and the lead found in the RAN-seq and ChIP-seq are not experimatally validated.

      We acknowledge the reviewer's concern regarding the lack of experimental validation for the mechanisms identified through RNA-seq and ChIP-seq analyses in the generation of RCCs from the 2C state. We understand the importance of substantiating these molecular leads with empirical data to strengthen our conclusions. Currently, our findings are based on in-depth bioinformatic analyses, which have provided us with valuable insights and a strong basis for hypothesis generation. Moving forward, we plan to prioritize experimental validation of key pathways and targets identified in our study. This will include designing targeted experiments to elucidate the functional roles of these mechanisms in the 2C-mediated generation of RCCs. We appreciate the opportunity to clarify our approach and future directions, and we are committed to addressing this gap in subsequent work.

      (2) Comment from Reviewer 3 considering the very low number of RCCs (0.6%-1.5% of cells) generated cannot protect the heart from MI, and whether 2C affects the the survival or metabolism of existing CM under hypoxia conditions, and what percentage of cells are regenerated by 2C treatment post-MI?

      We appreciate the reviewer's insightful queries regarding the protective effects of 2C treatment against myocardial infarction (MI) given the low percentage of RCCs generated. It is our hypothesis that the benefits of 2C treatment extend beyond mere cell numbers. We propose that 2C may enhance the survival and metabolic resilience of existing CMs under hypoxic conditions, thereby contributing to cardiac protection post-MI. Our future investigations will aim to quantify the precise percentage of cells regenerated by 2C treatment post-MI and explore its broader impacts on cardiac tissue survival and repair mechanisms.

      (3) Comment from Reviewer 3 suggesting the administration of 2C in mice, as well as whether 2C affects cardiac function under basal conditions and any physiology in mice, and the need to examine cardiac structural and functional parameters after administration of 2C.

      We appreciate the reviewer's interest in the potential effects of 2C administration on cardiac function and overall physiology in mice. While we observed a decrease in body weight at P5 compared to controls, our immunofluorescence staining did not indicate any changes in cardiac structure (Figure 4— figure supplement 1E). This suggests that while 2C administration impacts neonatal rat physiology, it does not adversely affect cardiac structure under basal conditions. Further investigations are planned to assess the functional parameters of the heart post-2C administration to comprehensively understand its effects.

      (4) Comment from Reviewer 3 suggesting the potential effects of 2C on other cell types of the heart, including fibroblasts and endothelial cells, in vitro and in vivo.

      We value the reviewer's suggestion to explore the effects of 2C on various cardiac cell types, including fibroblasts and endothelial cells, both in vitro and in vivo. We acknowledge the importance of understanding the broader impact of 2C treatment across different cell populations within the heart, given its potential protective effects. To address this, we are designing a series of experiments to assess 2C's influence on these cell types, aiming to elucidate any changes in their behavior, proliferation, and function following treatment. This comprehensive approach will allow us to better understand the mechanistic basis of 2C's cardioprotective effects.

      (5) Comment from Reviewer 3 suggesting validation the effect of 2C in a dose-dependent manner.

      As suggested by the reviewer, we have supplemented the effect of 2C in dose-dependent (Figure 1— figure supplement 1F-G).

      (6) Comment from Reviewer 3 suggesting an explanation of how A-485 affects H3K27Ac and H3K9Ac.

      We appreciate the reviewer pointing out the discrepancy regarding the effects of A-485 on H3K27Ac and H3K9Ac. Upon re-examination of our data, we realize that our initial interpretation may have overlooked the broader impact of A-485 on histone acetylation patterns. It appears that A-485 does indeed influence both H3K27Ac and H3K9Ac, contrary to our initial statement. This oversight will be corrected in our revised manuscript, where we will provide a more detailed analysis and discussion of A-485's impact on these histone marks, alongside an explanation for the observed effects (lines 223-269 across page 8-9).

      (7) Comment from Reviewer 3 with a correction to use "regeneration" at the screeing stage.

      As suggested by the reviewer, we have amended the wording in the text (line 66 on page 3).

      Reviewer 4

      Comment from Reviewer 4 suggesting more information that clarifies and justifies the hypothesis.

      As suggested by the reviewer, we added more information to clarify and justify the hypothesis (lines 39-47 on page 3).

      (1) Comment from Reviewer 4 pointing out the story line is not well developed.

      To address the reviewer’s question, we revised the manuscript to ensure a smooth and coherent logical flow.

      (2) Comment from Reviewer 4 pointing out the purpose in choosing to study ISL1-CMs.

      As raised by the reviewer, we have clarified the rationale for using ISL1 as a marker to define RCCs in revised manuscript (lines 39-47 on page 3).

      (3) Comment from Reviewer 4 pointing out the missing references in row 57-58.

      Thank you for pointing this out, we fixed it.

      (4) Comment from Reviewer 4 suggesting more explains and show the results of the screening compounds.

      As suggested by the reviewer, we added additional explanations in lines 65-73 and showed the screening results in Figure 1—figure supplement 1F-H.

      (5) Comment from Reviewer 4 suggesting an in-depth discussion of the findings.

      Thank you for the suggestion, we included additional discussion at the end of the article.

      (6) Comment from Reviewer 4 suggesting a conclusion should be inculded in the main text.

      Thank you for the suggestion, we made a revision.

      (7) Comment from Reviewer 4 pointing out the cell viability under different concentrations of 2C.

      As mentioned by the reviewer, have supplemented the cell numbers during different doses of 2C treatment (Figure 2F).

      (8) Comment from Reviewer 4 pointing out the missing information in the methods.

      Thank you for the suggestion, we made additions.

      (9) Comment from Reviewer 4 suggesting more explanations in Figure S3A.

      As mentioned by the reviewer, we made a revision in original Fig.S3A (now is Figure 2—figure supplement 1).

      (10) Comment from Reviewer 4 pointing out the high variability of mCherry cells (%) in Figure 3J.

      Thank you. We made a revision.

      (11) Comment from Reviewer 4 suggesting more explanations on the DNA-binding motif of ISL1 in the cells treated with A-485 or 2C.

      Thank you for the suggestion, we added additional explanations (lines 270-274 on page 9).

      (12) Comment from Reviewer 4 pointing out the unclear labeling in Figure S1B and D.

      Thank you for the suggestion, made a revision (lines 240-245 in supporting information).

      (13) Comment from Reviewer 4 suggesting a relative quantification of the proteins in Figure 1H.

      Thank you for the suggestion. We have quantified the relative expression levels of proteins in original Fig. 1H. As shown in Figure 1F.

      (14) Comment from Reviewer 4 suggesting to provide detailed information in the methodology part about the compounds.

      Thank you for the suggestion, we made a revision.

      (15) Comment from Reviewer 4 pointing out the insufficient explanations on figure legends.

      Thank you for the suggestion, we made a revision.

      (16) Comment from Reviewer 4 suggesting more independent experiments to reduce the high variations in “ns” between NC and 2C at 60h+3d shown in Figure 2E and F.

      Thank you for the suggestion, we made a revision in Figure 2F.

      (17) Comment from Reviewer 4 suggesting a limitations should be provided in the text.

      Thank you for the suggestion, we have made provide a limitation statement in the revised manuscript (lines 300-311 on page 10).

    2. eLife Assessment

      This manuscript offers valuable information on the combinatory effect of small molecules, CHIR99021 and A-485 (2C), during the reprogramming of mature cardiomyocytes into regenerative cardiac cells on stimulating cardiac cell regeneration. Although the study used several hESC lines and an in vivo model of myocardial injury to demonstrate the regenerative potential of cardiac cells, the manuscript is still incomplete as several concerns remain unanswered, including the lack of validation of the conclusions from scRNA-seq. It is still unclear how a small fraction of dedifferentiating cardiac cells can offer such broad effects on regeneration both in vitro and in vivo. If validated, this study might unlock potential therapeutic strategies for cardiac regeneration.

    3. Reviewer #1 (Public review):

      The present manuscript by Zhou and colleagues investigates the impact of a new combination of compounds termed CHIR99021 and A-485 on stimulating cardiac cell regeneration. This manuscript fits the journal and addresses an important contribution to scientific knowledge.

      Comments on latest version:

      The authors have addressed all of our comments.

    4. Reviewer #2 (Public review):

      Summary:

      This manuscript reports that a combination of two small molecules, 2C (CHIR99027 and A-485) enabled to induce the dedifferentiation of hESC-derived cardiomyocytes (CMs) into regenerative cardiac cells (RCC). These RCCs had disassembled sarcomeric structures and elevated expression of embryonic cardiogenic genes such as ISL1, which exhibited proliferative potential and were able to differentiate into cardiomyocytes, endothelial cells, and smooth muscle cells. Lineage tracing further suggested that RCCs originated from TNNT2+ cells, not pre-existing ISL1+ cells. Furthermore, 2C treatment increased the numbers of RCC cells in neonatal rat and adult mouse hearts, and improves cardiac function post-MI in adult mice. Mechanistically, bulk RNA-seq analysis revealed that 2C led to elevated expression of embryonic cardiogenic genes while down-regulation of CM-specific genes. Single-cell RNA-seq data showed that 2C promoted cardiomyocyte transition into an intermediate state that are marked with ACTA2 and COL1A1, which subsequently transform into RCCs. Finally, ChIP-seq analysis demonstrated that CHIR99027 enhanced H3K9Ac and H3K27Ac modifications in embryonic cardiac genes, while A-485 inhibited these modifications in cardiac-specific genes. These combined alterations effectively induced the dedifferentiation of cardiomyocytes into RCCs. Overall, this is an important work, presenting a putative cardiac regenerative cell types that may represent endogenous cardiac regeneration in regenerative animals. With that said, here are suggestions for the authors:

      Strengths:

      Overall, this work is quite comprehensive and is logically and rigorously designed. The phenotypic and functional data on 2C are strong.

      Weaknesses or suggestions:

      (1) In Figure 4, the authors should perform additional experiments on analyzing 2C effect on cardiomyocytes, endothelial cells, and fibroblasts in adult mouse hearts after myocardial infarction.<br /> (2) In Figures 5-7, the mechanistic insights of 2C are primarily derived from transcriptomic and genomic datasets without experimental verification.<br /> (3) The authors should compare transcriptomic profiling of the RCCs with other putative cardiac progenitors from public databases.

    5. Reviewer #3 (Public review):

      Summary:

      The ability of cardiac cells to regenerate has been the object of intense (and sometimes controversial) research in biology. While lower organisms can robustly undergo cardiac regeneration by reactivation of embryonic cardiogenic pathway, this ability is strongly reduced in mice, both temporally and qualitatively. Finding a way to derive precursor cells with regenerative ability from differentiated cells in mammals has been challenging.

      Zhou, He and colleagues hypothesized that ISL-1-positive cells would show regenerative capacity and developed a small molecules screen to dedifferentiate cardiomyocytes (CM) to ISL1-positive precursor cells. Using hESC-derived CM, authors found that the combination of both, WNT activation (CHIR99021) and p300 acetyltransferase inhibition (A-485) (named 2C protocol) induces CM dedifferentiation to regenerative cardiac cells (RCCs). RCCs are proliferative and re-express embryonic cardiogenic genes while decreasing expression of more mature cardiac genes, bringing them towards a more precursor-like state. RCCs were able to differentiate to CM, smooth muscle cells and endothelial cells, highlighting their multipotent property. In vivo administration of 2C in rats and mice had protective effects upon myocardial infarction.

      Mechanistically, authors report that 2C protocol drives CM-specific transcriptional and epigenetic changes.

      Strengths:

      The authors made a great effort to validate their data using orthogonal ways, and several hESC lines. The use of lineage tracing convincingly showed a dedifferentiation from CM. They translate their findings into an in vivo model of myocardial injury, and show functional cardiac regeneration post injury. They also showed that 2C could surprisingly be used as preventive treatment. Together their data may suggest a regenerative effect of 2C both in vitro and in vivo settings. If confirmed, this study might unlock therapeutic strategy for cardiac regeneration.

      Weaknesses:

      Updated General comments:

      Experimental design & Interpretation

      (1) The titration provided by the author following the first round of revision is puzzling to me. Based on the authors explanation, the initial screen was performed using 10uM of A-485, allowing the authors to choose CHIR + A-485 as a combination of drugs increasing Isl1-positive cells. However, in the titration provided, the combination of CHIR + 10uM of A-485 (used during the screen) shows *no* increase of the percentage of Isl-1-positive cells compared to DMSO control. How is that possible? Can the authors provide a transparent explanation of the experimental design for their screen. How was A-485 isolated from the 4000+ compounds tested if it does not show any effect on the titration? This titration raises significant concerns about the rational of following up with the combination of compounds.

      (2) The authors have not really addressed the concern raised earlier. If only ~1% of the cells de-differentiate and become Isl-positive, how can anybody quantify a nuclear/cytosolic ratio at the global population and show statistical significant when only 1% of the cells should be different?

      (3) Authors now provide a quantification of the effect of I-BET-762 (Supp 1H). While the authors state " [the combination of CHIR + I-BET-762] was less effective than A-485 in combination with CHIR99021", the figure provided does not test that. A side-by-side comparaison of the effect of A485 and I-BET should have been performed on the same graph. I-BET increases by 4 fold, while A-485 increases by 5-fold, which, based on the variation of their data, will unlikely be statistically different. The rational for disregarding the effect of I-BET-762 is therefore weakened.

      (4) Why NR2F2 is statistically significant in one set of experiments (Fig 2 - Fig. supplement 1) and then non-significant in another set (Fig. 1G) using the exact same experiment design (NC vs 2C for 60h) and similar statistical test applied?

      Statistics & Data Acquisition

      (1) Authors should refrain from deriving statistics from 2 biological repeats (Figure 3G).<br /> (2) Authors still do not state whether the normality of their data was tested.<br /> (3) What is the rational for using a two-way ANOVA for Fig 3G? Authors are only comparing the effect of their treatment for each marker. Same question for most panels from Figure 1, Fig 2C, 2F, and throughout the manuscript. This needs clarification/justification especially because in other experiments, they used multiple unpaired t-test (Fig 2 - Fig. supplement 1).

      Others

      (1) Authors should try to make their manuscript colorblind-friendly: No modification added following this comment.

    1. eLife Assessment

      This valuable article represents a significant body of work that addresses some novel aspects of the biology of lung cancer, the overall influence of CHIP and its impacts on responses to therapy. While a high clonal hematopoiesis (CHIP) burden was previously linked with an inflammatory phenotype in other disease settings, the authors demonstrate with solid evidence that this is also true for lung cancer. CHIP is complex and more data will be required to substantiate more evidence with regard perhaps to specific mutations in certain situations and how this might influence therapy choices.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates the impact of Clonal Hematopoiesis of Indeterminate Potential (CHIP) on Immune Checkpoint Inhibitor (ICI) therapy outcomes in NSCLC patients, analyzing blood samples from 100 patients pre- and post-ICI therapy for CHIP, and conducting single-cell RNA sequencing (scRNA-seq) of PBMCs in 63 samples, with validation in 180 more patients through whole exome sequencing. Findings show no significant CHIP influence on ICI response, but a higher CHIP prevalence in NSCLC compared to controls and a notable CHIP burden in squamous cell carcinoma. Severely affected CHIP groups showed NF-kB pathway gene enrichment in myeloid clusters.

      Strengths:

      The study is commendable for analyzing a significant cohort of 100 patients for CHIP and utilizing scRNA-seq on 63 samples, showcasing the use of cutting-edge technology.

      The study tackles the vital clinical question of predicting ICI therapy outcomes in NSCLC.

      Weaknesses:

      The study groups, comprising NSCLC patients and healthy controls, exhibit notable differences in sex distribution and smoking status. Given that smoking is a well-established factor influencing CHIP status, this introduces potential confounding variables that may impact the study's conclusions. The authors have appropriately acknowledged these disparities and provided a transparent discussion of their implications.

      Comments on revised submission:

      The authors thoroughly addressed all my concerns. Thank you very much for your additional work.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study investigates the impact of Clonal Hematopoiesis of Indeterminate Potential (CHIP) on Immune Checkpoint Inhibitor (ICI) therapy outcomes in NSCLC patients, analyzing blood samples from 100 patients pre- and post-ICI therapy for CHIP, and conducting single-cell RNA sequencing (scRNA-seq) of PBMCs in 63 samples, with validation in 180 more patients through whole exome sequencing. Findings show no significant CHIP influence on ICI response, but a higher CHIP prevalence in NSCLC compared to controls, and a notable CHIP burden in squamous cell carcinoma. Severely affected CHIP groups showed NF-kB pathway gene enrichment in myeloid clusters.

      Strengths:

      The study is commendable for analyzing a significant cohort of 100 patients for CHIP and utilizing scRNA-seq on 63 samples, showcasing the use of cutting-edge technology. The study tackles the vital clinical question of predicting ICI therapy outcomes in NSCLC.

      Weaknesses:

      The manuscript's comparison of CHIP prevalence between NSCLC patients and healthy controls could be strengthened by providing more detailed information on the control group. Specifically, details such as sex, smot king status, and comorbidities are needed to ensure the differences in CHIP are attributable to lung cancer rather than other factors. Including these details, along with a comparative analysis of demographics and comorbidities between both groups and clarifying how the control group was selected, would enhance the study's credibility and conclusions.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a large cohort of patients with metastatic lung cancer pre- and 1-3 weeks post-immunotherapy. The goal was to investigate whether immunotherapy results in changes in CHIP clones (using targeted sequencing and whole exome sequencing) as well as to investigate whether patients with CHIP changed their response to immunotherapy (single-cell RNA sequencing).

      Strengths:

      This represents a large cohort of patients, and comprehensive assays - including targeted sequencing, whole exome sequencing, and single-cell RNA sequencing.

      Weaknesses:

      Findings are not necessarily unexpected. With regards to clonal dynamics, it would be very unlikely to see any changes within a few weeks' time frame. Longer follow-up to assess clonal dynamics would realistically be necessary.

      We truly appreciate constructive comments by the reviewers and editors. We agree with these comments and did our best to address them to improve the paper. Please see the following pages.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1-1. In Figure 3B, the changes in frequency are challenging to discern. Consider employing connected line plots or another visual representation to enhance clarity and interpretation.

      Thank you for the suggestion. We modified Figure 3B to efficiently visualize the changes in cell proportion. Please note that the proportional changes in cell populations were not statistically significant by treatment, pathology, or clonal hematopoiesis (CH) severity.

      Comment 1-2. On page 13, Figure 3D is mentioned before Figure 3C. Please re-order to follow the correct sequence.

      We corrected the sequence of the figure and revised the text accordingly.

      Comment 1-3. Supplementary Figure 9 reveals an intriguing observation: the hypoxia and TNF signaling pathways appear to be regulated in opposite directions between CHIP-negative subjects and those with a Variant Allele Frequency (VAF) greater than 0.1. It would be insightful if the authors could delve into the potential implications or interpretations of this finding.

      We appreciate the reviewer's insightful comment. In the GSEA results presented in Supplementary Figure 9 and Figure 3C, we specifically focused on TNF signaling in monocytes and cDCs. Our subsequent analysis revealed that the adaptation of inflammatory signals is enriched in the myeloid cells in the CHIP-severe patients (Supplementary Fig. S12). Following the reviewer’s comment, we found that the leading-edge genes were shared between the TNF signaling and hypoxia pathways in most clusters (Supplementary Fig. S15). Suggested core genes, such as FOS, DUSP1, JUN, and PPP1R15A, play critical roles in the inflammatory phenotypes of myeloid lineages. Based on this finding, we added a paragraph in the Discussion section to address the implications of these shared signatures as follows (lines 340-348):

      “Our GSEA results specifically indicated the enrichment of TNF signaling and hypoxia pathways in most clusters of patients with severe CH (Supplementary Fig. S9). The leading-edge genes from GSEA results showed core genes such as FOS, DUSP1, JUN, and PPP1R15A, which are known to play critical roles in the inflammatory phenotypes of immune cells, were shared between the TNF signaling and hypoxia pathways in all significant clusters. (Supplementary Fig. S15). Furthermore, gene regulatory network analysis using SCENIC indicated a higher enrichment of inflammatory signatures in myeloid lineages (Supplementary Fig. S9), highlighting the pronounced inflammatory phenotype of CH clones in these cell lineages.”

      Comment 1-4. The plots in Supplementary Figure 12 would benefit from enlargement to improve legibility and facilitate a better understanding of the data presented.

      We improved resolutions and enlarged Supplementary Figure S12.

      Reviewer #2 (Recommendations For The Authors):

      Comment 2-1. The authors state that CHIP is seen at a higher prevalence in the metastatic lung (44/100) vs controls (5/42), however, no in-depth info other than age is given about the normal cohort (Table S2). It would be important to make sure the cohorts are matched with regards to smoking hx, age range, etc before making the claim that CHIP is more frequent in the metastatic lung cancer group.

      Thank you for the comment. To provide additional information of control cohort including current smoking habits and their sex information, we added columns in Table S2. While we tried to match the age distributions between the control group without a history of solid cancer and the lung cancer cohort, we observed that the lung cancer cohort had slightly older ages (mean ages: 58.9 vs. 64.1 years), a higher prevalence of smoking (current smokers: 11/42 vs. 37/100), and a higher proportion of males (male/female: 18/24 vs. 91/9).  Age and smoking are well-known epidemiological contributors to lung cancer and could influence the prevalence of clonal hematopoiesis (CH).

      However, previous studies have reported similar prevalence rates of CH in NSCLC patients, which aligned with our findings (Bolten et al., 2020 Nat Genet; Hong et al., 2022 Cancer Res). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in healthy populations (Levin et al., 2022 Sci Rep). We have acknowledged these factors as major limitations of our study in the Discussion section as follows (lines 379-390):

      “Also, the distinct characteristics of our cohort can be confounders for our results. Compared to control patients, our cohort was biased toward slightly older ages, higher prevalence of smoking habits, and with a higher proportion of males (mean age: 64.1 vs. 58.9; current smokers: 37/100 vs. 11/42; male/female: 91/9 vs. 18/24 Supplementary Figures S1 and S3). However, previous studies have reported similar prevalence rates of clonal hematopoiesis in NSCLC patients, aligned with our findings (9,51). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in both healthy populations and NSCLC patients (10,51,52).”

      Comment 2-2. Figure 1 - 1A states there were 100 CHIP and CHIP-PD mutations identified, but in 1B, C, and D there are < 100 bars and/or dots shown. How were the mutations in 1A then triaged to be shown in 1B-D?

      It appears that our poor annotation caused this misunderstanding. In Figure 1A, we showed the number of samples in each study group but did not provide detailed information in the legend. We found 67 mutations among the 100 patients and presented the mutational statistics in Figures 1B–D. Accordingly, we have revised the Figure 1 legend to clarify this sentence “The numbers indicate sample counts in each group.” (lines 426-427).

      Comment 2-3. Table S4 - would be helpful to have # of variant reads and # of total reads as columns (and also calculate VAF for an additional column).

      Thank you for the comment. We added columns revealing the total number of reads and the number of variant reads in Table S4. Also, we calculated the VAF and included it as a new column as suggested by the reviewer.

    1. eLife Assessment

      This work is of fundamental significance and has an exceptional level of evidence for a new population that protects against obesity-induced hypothalamic inflammation. This topic will attract attention from a broad base of readers, from hypothalamic neuroscientists to immunologists with an interest in metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      The present work from Velloso and collaborators investigated the transcription profiles of resident and recruited hypothalamic microglia. They found sex-dependent differences between males and females and identified the protective role of chemokine receptor CXCR3 against diet-induced obesity.

      Strengths:

      (1) Novelty<br /> (2) Relevance, since this work provides evidence about a subset of recruited microglia that has a protective effect against DIO. This provides a new concept in hypothalamic inflammation and obesity.

      Comments on revised version:

      All my comments have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Mendes et al provides novel key insights in the role of chemotaxis and immune cell recruitment into the hypothalamus in the development of diet-induced obesity. Specifically, the authors first revealed that although transcriptional changes in hypothalamic resident microglia following exposure to high-fat feeding are minor, there are compelling transcriptomic differences between resident microglia and microglia recruited to the hypothalamus, and these are sexually dimorphic. Using independent loss-of-function studies, the authors also demonstrate an important role of CXCR3 and hypothalamic CXCL10 in the hypothalamic recruitment of CCR2+ positive cells on metabolism following exposure to high-fat diet-feeding in mice. This manuscript puts forth conceptually novel evidence that inhibition of chemotaxis-mediated immune cell recruitment accelerates body weight gain in high-fat diet-feeding, suggesting that a subset of microglia which express CXCR3 may confer protective, anti-obesogenic effects.

      Strengths:

      The work is exciting and relevant given the prevalence of obesity and the consequences of inflammation in the brain on perturbations of energy metabolism and ensuant metabolic diseases. Hypothalamic inflammation is associated with disrupted energy balance, and activated microglia within the hypothalamus resulting from excessive caloric intake and saturated fatty acids are often thought to be mediators of impairment of hypothalamic regulation of metabolism. The present work reports a novel notion in which immune cells recruited into the hypothalamus which express chemokine receptor CXCR3 may have a protective role against diet-induced obesity. In vivo studies reported herein demonstrate that inhibition of CXCR3 exacerbates high-fat diet-induced body weight gain, increases circulating triglycerides and fasting glucose levels, worsens glucose tolerance, and increases the expression of orexigenic neuropeptides, at least in female mice.

      This work provides a highly interesting and needed overview of preclinical and clinical brain inflammation, which is relevant to readers with an interest in metabolism and immunometabolism in the context of obesity.

      Using flow cytometry, cell sorting, and transcriptomics including RNA-sequencing, the manuscript provides novel insights on transcriptional landscapes of resident and recruited microglia in the hypothalamus. Importantly, sex differences are investigated.

      Overall, the manuscript is perceived to be highly interesting, relevant, and timely. The discussion is thoughtful, well-articulated, and a pleasure to read and felt to be of interest to a broad audience.

      Weaknesses:

      There were no major weaknesses perceived. Some comments for potential textual additions to the results/discussion are provided below.

      Could the authors comment on the choice of peripheral administration of CXCR3 antagonist as opposed to central (e.g. icv) administration? Indeed, systemic inhibition of CXCR3 produced significant alterations in body weight gain and glucose tolerance in female mice given high-fat diet and reduced CCR2 and CXCR3 immunostaining in the hypothalamus. Could changes to peripheral (e.g. WAT, liver) immune responses to the diet underlie the metabolic changes observed?

      Besides hypothalamic mRNA levels of chemokines and chemokine receptors, does systemic CXCR3 antagonism affect other aspects linked to diet-induced impairments of hypothalamic regulation of energy homeostasis, like inflammation, ER stress and/or mitochondrial dynamics/function? It would be interesting to reveal the consequence of reduced CCR2+ microglial migration to the hypothalamus with chronic high-fat diet exposure.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The present work from Velloso and collaborators investigated the transcription profiles of resident and recruited hypothalamic microglia. They found sex-dependent differences between males and females and identified the protective role of chemokine receptor CXCR3 against diet-induced obesity.

      Strengths:

      (1) Novelty;

      (2) Relevance, since this work provides evidence about a subset of recruited microglia that has a protective effect against DIO. This provides a new concept in hypothalamic inflammation and obesity.

      Weaknesses:

      (1) Lack of mechanistic insight into the sex-dependent effects;

      (2) Analysis of indirect calorimetry data requires more depth;

      (3) A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Mendes et al provides novel key insights into the role of chemotaxis and immune cell recruitment into the hypothalamus in the development of diet-induced obesity. Specifically, the authors reveal that although transcriptional changes in hypothalamic resident microglia following exposure to high-fat feeding are minor, there are compelling transcriptomic differences between resident microglia and microglia recruited to the hypothalamus, and these are sexually dimorphic. Using independent loss-of-function studies, the authors also demonstrate an important role of CXCR3 and hypothalamic CXCL10 in the hypothalamic recruitment of CCR2+ positive cells on metabolism following exposure to high-fat diet-feeding in mice. This manuscript puts forth conceptually novel evidence that inhibition of chemotaxis-mediated immune cell recruitment accelerates body weight gain in high-fat diet-feeding, suggesting that a subset of microglia that express CXCR3 may confer protective, anti-obesogenic effects.

      Strengths:

      The work is exciting and relevant given the prevalence of obesity and the consequences of inflammation in the brain on perturbations of energy metabolism and ensuant metabolic diseases. Hypothalamic inflammation is associated with disrupted energy balance, and activated microglia within the hypothalamus resulting from excessive caloric intake and saturated fatty acids are often thought to be mediators of impairment of hypothalamic regulation of metabolism. The present work reports a novel notion in which immune cells recruited into the hypothalamus that express chemokine receptor CXCR3 may have a protective role against diet-induced obesity. In vivo studies reported herein demonstrate that inhibition of CXCR3 exacerbates high-fat diet-induced body weight gain, increases circulating triglycerides and fasting glucose levels, worsens glucose tolerance, and increases the expression of orexigenic neuropeptides, at least in female mice.

      This work provides a highly interesting and needed overview of preclinical and clinical brain inflammation, which is relevant to readers with an interest in metabolism and immunometabolism in the context of obesity.

      Using flow cytometry, cell sorting, and transcriptomics including RNA-sequencing, the manuscript provides novel insights into transcriptional landscapes of resident and recruited microglia in the hypothalamus. Importantly, sex differences are investigated.

      Overall, the manuscript is perceived to be highly interesting, relevant, and timely. The discussion is thoughtful, well-articulated, and a pleasure to read and felt to be of interest to a broad audience.

      Weaknesses:

      There were no major weaknesses perceived. Some comments for potential textual additions to the results/discussion are listed in recommendations to authors.

      Comments from the authors regarding the evaluation of the article: We publicly express our gratitude for the work of both Reviewers. The comments were timely and constructive and guided us toward preparing a new version of the article which contains novel data that strengthened the overall quality of the study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Experiments with ovariectomized female mice with (and without) estrogen replacement would help to address the physiological basis of the observed sexdependent effects.

      We performed an experiment with female C57BL/6J Unib, subdivided into Sham, OVX, and OVX+EST groups, which were exposed to HFD for 4 weeks. We monitored the weekly evolution of body weight and food intake. At the end of the protocol, the animals fasted for 4 hours. Then, we measured fasting blood glucose and estradiol; and extracted tissues (hypothalamus and

      WAT). In the hypothalamus samples, we evaluated, by RT-qPCR, the expression of chemokines, chemokine receptors, and some pro-inflammatory cytokines and neuropeptides. We evaluated the body mass relative WAT weight. The new results are presented in Supplementary Figure 1.

      Indirect calorimetric analysis of energy expenditure will benefit from ANCOVA analysis using body weight as a covariate. Moreover, locomotor activity should be also controlled.

      All statistical analysis regarding energy expenditure is corrected by body mass, thus, there is no need for ANCOVA, we clarified this in the text. The determination of locomotor activity is now included in Supplementary Figures 2 and 3. 

      A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      We performed new experiments to determine the expression of hypothalamic inflammation and ER stress pathaways. This is shown in Suppl. Fig. 2 and 3. 

      Mechanistic inhibition of CXCR3 was performed by CXCL10 immunoneutralization and CXCR3 antagonism. Those approaches are correct and well-performed, however considering the experience of the group in hypothalamic studies, I miss a virogenetic-based knockdown. Do the authors have any data on that?

      This is indeed a great point. Unfortunately, we did not succeed in obtaining mice Cre lineages that would be needed for the proposed experiments. We included this as a weakness of the study. 

      Reviewer #2 (Recommendations For The Authors):

      There are a few typographical errors for correction:

      -  Page 4, line 157: CCL10 to CXCL10.

      -  Page 6, line 226: makers to markers.

      -  Page 7, lines 283 and 287, Figure 6C: INF to IFN.

      All errors were corrected, as recommended. 

      Parts of the manuscript may be difficult for readers without knowledge of transcriptomics to interpret; thus, further description of several of the figures (e.g. Figure 3 and 4) may be helpful.

      We expanded the text in Results to clarify this issue.

      Could the authors comment on the choice of peripheral administration of CXCR3 antagonist as opposed to central (e.g. icv) administration? Indeed, systemic inhibition of CXCR3 produced significant alterations in body weight gain and glucose tolerance in female mice given high-fat diets and reduced CCR2 and CXCR3 immunostaining in the hypothalamus. Could changes to peripheral (e.g. WAT, liver) immune responses to the diet underlie the metabolic changes observed?

      CXCR3+ cells are present in very small numbers in the hypothalamus under basal conditions. In HFD, these are recruited from the periphery to the CNS, so, we believe ICV treatment with AMG487 would not reduce recruitment to the hypothalamic parenchyma. With the same animals in which we performed the locomotor activity, we performed RT-qPCR of WAT and liver and analyzed the expression of genes involved in lipid and glucose metabolism. This is now in Supplementary Figures 2 and 3. We included a comment in the text to explain our rationale for this approach.

      Besides hypothalamic mRNA levels of chemokines and chemokine receptors, does systemic CXCR3 antagonism affect other aspects linked to diet-induced impairments of hypothalamic regulation of energy homeostasis, like inflammation, ER stress and/or mitochondrial dynamics/function? It would be interesting to reveal the consequence of reduced CCR2+ microglial migration to the hypothalamus with chronic high-fat diet exposure.

      We performed new experiments shown in Supplementary Figures 2 and 3 to deal with these important questions. In the hypothalamus of females there were no changes in the expression of transcripts encoding proteins involved in endoplasmic reticulum homeostasis and mitochondrial turnover, whereas in males there was a reduction of Ddit3 and Mfn1. Moreover, in females the inhibition of CXCR3 promoted no changes in the liver expression of lipidogenic and gluconeogenic genes, and no changes in the white adipose tissue expression of lipidogenic genes. In the liver of males, there was a reduction in the expression of Fasn and an increase in the expression of G6pc3. As for the females, in males, there were no changes in the white adipose tissue expression of lipidogenic genes.

    1. eLife Assessment

      This study presents the cryo-EM structures of two human biotin-dependent mitochondria carboxylases involved in various biological pathways, including the metabolism of certain amino acids, cholesterol, and odd chain fatty acids. The cryo-EM structures offer a valuable addition to the structural description of biotin-dependent carboxylases and provide solid evidence to support the major conclusions of this study. This paper would be of interest to biochemists and structural biologists working on biotin-dependent carboxylases.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Zhou et al offers new high resolution Cryo-EM structures of two human biotin-dependent enzymes: propionyl-CoA carboxylase (PCC) and methycrotonyl-CoA carboxylase (MCC). While X-ray crystal structures and Cryo-EM structures have previously been reported for bacterial and trypanosomal versions of MCC and for bacterial versions of PCC, this marks one of the first high resolution Cryo-EM structures of the human version of these enzymes. Using the biotin cofactor as an affinity tag, this team purified a group of four different human biotin-dependent carboxylases from cultured human Expi 293F (kidney) cells (PCC, MCC, acetyl-CoA carboxylase (ACC), and pyruvate carboxylase). Following further enrichment by size-exclusion chromatography, they were able to vitrify the sample and pick enough particles of MCC and PCC to separately refine the structures of both enzymes to relatively high average resolutions (the Cryo-EM structure of ACC also appears to have been determined from these same micrographs, though this is the subject of a separate publication). To determine the impact of substrate binding on the structure of these enzymes and to gain insights into substrate selectivity, they also separately incubated with propionyl-CoA and acetyl-CoA and vitrified the samples under active turnover conditions, yielding a set of cryo-EM structures for both MCC and PCC in the presence and absence of substrates and substrate analogues.

      Strengths:

      The manuscript has several strengths. It is clearly written, the figures are clear and the sample preparation methods appear to be well described. This study demonstrates that Cryo-EM is an ideal structural method to investigate the structure of these heterogeneous samples of large biotin-dependent enzymes. As a consequence, many new Cryo-EM structures of biotin-dependent enzymes are emerging, thanks to the natural inclusion of a built-in biotin affinity tag. While the authors report no major differences between the human and bacterial forms of these enzymes, it remains an important finding that they demonstrate how/if the structure of the human enzymes are or are not distinct from the bacterial enzymes. The MCC structures also provide evidence for a transition for BCCP-biotin from an exo-binding site to an endo-binding site in response to acetyl-CoA binding. This contributes to a growing number of biotin-dependent carboxylase structures that reveal BCCP-biotin binding at locations both inside (endo-) and outside (exo-) of the active site.

      Weaknesses:

      There are some minor weaknesses. Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues. There are sections of this manuscript that do not sufficiently clarify what represents a new insight from the current set of structures (there are few of them), vs. what is largely recapitulating what has been seen in previous structures.

      There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors acknowledge that they are limited in their interpretations as a consequence of the acyl groups being unresolved in all of the structures. They offer a simple, overarching and not particularly insightful explanation that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. Essentially, the authors concede that these cryo-EM structures offer no new insights into the structural basis for substrate selectivity in PCC, confirming that these structures do not yet fully capture the proper conformational states.

      Some of these minor deficiencies aside, the overall aim of contributing new cryo-EM structures of the human MCC and PCC has been achieved. While I am not a cryo-EM expert, I see no flaws in the methodology or approach. While the contributions from these structures are somewhat incremental, it is nevertheless important to have these representative examples of the human enzymes and it is noteworthy to see a new example of the exo-binding site in a biotin-dependent enzyme.

    3. Reviewer #2 (Public review):

      Summary:

      This paper reports the structures of two human biotin-dependent carboxylases. The authors used endogenously purified proteins and solved the structures in high resolutions. Based on the structures, they defined the binding site for acyl-CoA and biotin and reported the potential conformational changes in biotin position.

      Strengths:

      The authors effectively utilized the biotin of the two proteins and obtained homogeneous proteins from human cells. They determined the high-resolution structures of the two enzymes in apo and substrate-bound states.

      Comments and questions to the manuscripts:

      (1) I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      (2) In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      (3) In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      (4) In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      (5) In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      (6) How are the solved structures compared with the latest Alphafold3 prediction?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      There are some minor weaknesses.

      Comment 1:Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues.

      We agree that the structures of the human MCC and PCC holoenzymes are similar to their bacterial homologs. That is due to the conserved sequences and functions of MCC and PCC across different species.

      Comment 2: There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors state that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. This is not a particularly deep analysis and doesn't really require a cryo-EM structure to invoke. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. This suggests, perhaps, that these structures do not yet fully capture the proper conformational states.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 3: The authors also need to be careful with their over-interpretation of structure to invoke mechanisms of conformational change. A snapshot of the starting state (apo) and final state (ligand-bound) is insufficient to conclude *how* the enzyme transitioned between conformational states. I am constantly frustrated by structural reports in the biotin-dependent enzymes that invoke "induced conformational changes" with absolutely no experimental evidence to support such statements. Conformational changes that accompany ligand binding may occur through an induced conformational change or through conformational selection and structural snapshots of the starting point and the end point cannot offer any valid insight into which of these mechanisms is at play.

      Point accepted. We have revised our manuscript to use conformational differences instead of conformational changes to describe the differences between the apo and ligand-bound states (see the last paragraph of the introduction section and the third paragraph of the discussion section).

      Reviewer #2 (Public Review):

      Comments and questions to the manuscripts:

      Comment 1: I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      We appreciate this comment. However, since we purified the endogenous BDCs and the sample we obtained was a mixture of four BDCs, the enzymatic activity of this mixture cannot accurately reflect the catalytic activity of PCC or MCC holoenzyme. We have revised the manuscript and acknowledged this limitation in the first paragraph of the results section: 

      “We did not characterize the enzyme activities of the mixed BDCs because the current methods used to evaluate the carboxylase activities of BDCs, such as measuring the ATP hydrolysis or incorporation of radio-labeled CO2, are unable to differentiate the specific carboxylase activity of each BDC.”

      Comment 2: In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      We appreciate this comment. We have shown the cryo-EM maps of the PCC and MCC holoenzymes in fig. S8 to indicate the unresolved regions in these structures. The BC domains in one layer of MCCα in the MCC-apo structure were not resolved. However, we think it would be better to show a complete structure in Fig. 1 to provide an overall view of the MCC holoenzyme. We have revised Fig. 1B and the figure legend to clearly point out which domains were not resolved in the cryo-EM map and were built in the structure through docking. We have also revised the main text to clearly describe which parts of the holoenzymes were not resolved in the cryo-EM maps and how the complete structures were built.

      Comment 3: In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      Point accepted. It was reported that G419 and A420 in Streptomyces coelicolor PCC, corresponding to G437 and A438 in human PCCβ, were the catalytic residues for the secondstep carboxylation reaction (PMID: 15518551). The same study also reported the catalytic mechanism of the carboxyl transfer reaction. The role of biotin in the BDC-catalyzed carboxylation reactions has been extensively studied (PMIDs: 22869039, 28683917). We have revised the manuscript to introduce the catalytic mechanisms of BDCs elucidated through the investigation of prokaryotic BDCs in the fourth paragraph of the introduction section. 

      Comment 4: In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      We appreciate this comment. We do not have a good explanation for why we did not observe a change in the propionyl-CoA bound MCC structure. It is noteworthy that neither acetyl-CoA nor propionyl-CoA is the natural substrate of MCC. Recently, a cryo-EM structure of the human MCC holoenzyme in complex with its natural substrate, 3-methylcrotonyl-CoA, has been resolved (PDB code: 8J4Z). In this structure, the binding site of biotin and the conformation of the CT domain closely resemble that in our acetyl-CoA-bound MCC structure. Therefore, the movement of biotin induced by acetyl-CoA binding mimics that induced by the binding of MCC's natural substrate, 3-methylcrotonyl-CoA, indicating that in comparison with propionylCoA, acetyl-CoA is closer to 3-methylcrotonyl-CoA regarding its ability to bind to MCC. We have discussed this possibility in the last paragraph of the discussion section. We have also added a supplementary figure (fig. S11) to compare the structures of human MCC holoenzyme in complex with acetyl-CoA and 3-methylcrotonyl-CoA.

      Comment 5: In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 6: How are the solved structures compared with the latest Alphafold3 prediction?

      Since AlphaFold3 was not released when our manuscript was submitted, we did not compare the solved structures with the AlphaFold3 predictions. We have now carried out the predictions using Alphafold3. Due to the token limitation of the AlphaFold3 server, we can only include two α and six β subunits of human PCC or MCC in the prediction. The overall assembly patterns of the Alphafold3-predicted structures are similar to that of the cryo-EM structures. The RMSDs between PCCα, PCCβ, MCCα, and MCCβ in the apo cryo-EM structures and those in the AlphaFold3-predicted structures are 7.490 Å, 0.857 Å, 7.869 Å, and 1.845 Å, respectively. The PCCα and MCCα subunits adopt an open conformation in the cryo-EM structures but adopt a closed conformation in the AlphaFold-3 predicted structures, resulting in large RMSDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      DMS-MaP is a sequencing-based method for assessing RNA folding by detecting methyl adducts on unpaired A and C residues created by treatment with dimethylsulfate (DMS). DMS also creates methyl adducts on the N7 position of G, which could be sensitive to tertiary interactions with that atom, but N7-methyl adducts cannot be detected directly by sequencing. In this work, the authors adopt a previously developed method for converting N7-methyl-G to an abasic site to make it detectable by sequencing and then show that the ability of DMS to form an N7-methyl-G adduct is sensitive to RNA structural context. In particular, they look at the G-quadruplex structure motif, which is dense with N7-G interactions, is biologically important, and lacks conclusive methods for in-cell structural analysis. 

      Strengths: 

      - The authors clearly show that established methods for detecting N7-methyl-G adducts can be used to detect those adducts from DMS and that the formation of those adducts is sensitive to structural context, particularly G-quadruplexes. 

      - The authors assess the N7-methyl-G signal through a wide range of useful probing analyses, including standard folding, adduct correlations, mutate-and-map, and single-read clustering. 

      - The authors show encouraging preliminary results toward the detection of G-quadruplexes in cells using their method. Reliable detection of RNA G-quadruplexes in cells is a major limitation for the field and this result could lead to a significant advance. 

      - Overall, the work shows convincingly that N7-methyl-G adducts from DMS provide valuable structural information and that established data analyses can be adapted to incorporate the information. 

      We thank the reviewer for their time and appreciate the reviewer for their positive assessment as well as for their suggestions which we have addressed below.

      Weaknesses: 

      - Most of the validation work is done on the spinach aptamer and it is the only RNA tested that has a known 3D structure. Although it is a useful model for validating this method, it does not provide a comprehensive view of what results to expect across varied RNA structures. 

      Thank you for your insightful comments. We agree that a more comprehensive view of BASH MaP involves probing a larger variety of RNAs with known 3D-structures beyond Spinach and the poly-UG RNA. Although outside the scope of this publication, more work is needed to reveal the determinants of N7G reactivity to DMS.

      - It's not clear from this work what the predictive power of BASH-MaP would be when trying to identify G-quadruplexes in RNA sequences of unknown structure. Although clusters of G's with low reactivity and correlated mutations seem to be a strong signal for G-quadruplexes, no effort was made to test a range of G-rich sequences that are known to form G-quadruplexes or not. Having this information would be critical for assessing the ability of BASH-MaP to identify G-quadruplexes in cells. 

      - Although the authors present interesting results from various types of analysis, they do not appear to have developed a mature analysis pipeline for the community to use. I would be inclined to develop my own pipeline if I were to use this method. 

      Thank you for your suggestion. We have more clearly annotated the python scripts and GitHub repository which contain all custom scripts used for analyzing BASH MaP data. These changes will enable researchers to more easily utilize our developed pipelines.

      - There are various aspects of the DAGGER analysis that don't make sense to me: <br /> (1) Folding of the RNA based on individual reads does not represent single-molecule folding since each read contains only a small fraction of the possible adducts that could have formed on that molecule. As a result, each fold will largely be driven by the naive folding algorithm. I recommend a method like DREEM that clusters reads into profiles representing different conformations. 

      (2) How reliable is it to force open clusters of low-reactivity G's across RNA's that don't already have known G-quadruplexes? 

      (3) By forcing a G-quadruplex open it will be treated as a loop by the folding algorithm, so the energetics won't be accurate. 

      (4) It's not clear how signals on "normal" G's are treated. In Figure 5C some are wiped to 0 but others are kept as 1. 

      Thank you for your keen observations regarding the conceptual frameworks utilized in DAGGER. We have included a complimentary analysis to DAGGER utilizing Spinach BASH MaP data with DANCE, an algorithm which shares an underlying architecture with DREEM, and found that DANCE analysis gave similar results to those found with DAGGER. However, we have not benchmarked DAGGER’s performance on a range of RNAs and compared the results with expectation-maximization algorithms like DREEM and DANCE.

      To minimize the effects of artificially creating loops with tertiary folding constraints, we utilized the RNA folding algorithm CONTRAfold which relies less on direct energetic calculations than other commonly used RNA folding algorithms such as RNAstructure.

      We have updated the main text to more clearly indicate how DAGGER handles signals at G’s in a range of conditions. The main text now better clarifies the specific logic used for determining which G’s contain either a 0 or a 1 in the bitvector encoding used in DAGGER analysis.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript introduces BASH MaP and DAGGER, innovative tools for analyzing RNA tertiary structures, specifically focusing on the G-quadruplexes. Traditional methods have struggled to detect and analyze these structures due to their reliance on interactions on the Hoogsteen face of guanine, which are not readily observable through conventional probing that targets Watson-Crick interactions. BASH MaP employs dimethyl sulfate and potassium borohydride to enhance the detection of N7-methylguanosine by converting it into an abasic site, thereby enabling its identification through misincorporation during reverse transcription. This method provides higher precision in identifying G-quadruplexes and offers deeper insights into RNA's structural dynamics and alternative conformations in both vitro and cellular contexts. Overall, the study is well-executed, demonstrating robust signal detection of N7-Gs with some compelling positive controls, thorough analysis, and beautifully presented figures. 

      Strengths: 

      The manuscript introduces a new method to detect G-quadruplexes (G-qs) that simplifies and potentially enhances the robustness and quantification compared to previous methods relying on reverse transcription truncations. The authors provide a strong positive control, demonstrating a 70% misincorporation at endogenous N7-G within the 18S rRNA, which illustrates BASH MaP's high signal-to-noise ratio. The data concerning the detection of positive control G-qs is particularly compelling. 

      Weaknesses: 

      Figure 3E shows considerable variability in the correlations among guanosines, suggesting that the methods may struggle with specificity in determining guanosine participation within and between different quadruplexes. There is no estimation of the methods false positive discovery rate.

      Thank you for your positive assessment and for your time to come up with suggestions to improve this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors aim to develop an experimental/computational pipeline to assess the modification status of an RNA following treatment with dimethylsulfate (DMS). Building upon the more common DMS Map method, which predominantly assesses the modification status of the Watson-Crick-Franklin face of A's and C's, the authors insert a chemical processing step in the workflow prior to deep sequencing that enables detection of methylation at the N7 position of guanosine residues. This approach, termed BASH MaP, provides a more complete assessment of the true modification status of an RNA following DMS treatment and this new information provides a powerful set of constraints for assessing the secondary structure and conformational state of an RNA. In developing this work, the authors use Spinach as a model RNA. Spinach is a fluorogenic RNA that binds and activates the fluorescence of a small molecule ligand. Crystal structures of this RNA with ligand bound show that it contains a G-quadruplex motif. In applying BASH MaP to Spinach, the authors also perform the more standard DMS MaP for comparison. They show that the BASH MaP workflow appears to retain the information yielded by DMS MaP while providing new information about guanosine modifications. In Spinach, the G-quadruplex G's have the least reactive N7 positions, consistent with the engagement of N7 in hydrogen bonding interactions at G's involved in quadruplex formation. Moreover, because the inclusion of data corresponding to G increases the number of misincorporations per transcript, BASH MaP is more amenable to analysis of co-occurring misincorporations through statistical analysis, especially in combination with site-specific mutations. These co-occurring misincorporations provide information regarding what nucleotides are structurally coupled within an RNA conformation. By deploying a likelihood-ratio statistical test on BASH MaP data, the authors can identify Gs in G-quadruplexes, deconvolute G-G correlation networks, base-triple interactions and even stacking interactions. Further, the authors develop a pipeline to use the BASH MaP-derived G-modification data to assist in the prediction of RNA secondary structure and identify alternative conformations adopted by a particular RNA. This seems to help with the prediction of secondary structure for Spinach RNA. 

      Strengths: 

      The BASH Map procedure and downstream data analysis pipeline more fully identify the complement of methylations to be identified from the DMS treatment of RNA, thereby enriching the information content. This in turn allows for more robust computational/statistical analysis, which likely will lead to more accurate structure predictions. This seems to be the case for the Spinach RNA. 

      Weaknesses: 

      The authors demonstrate that their method can detect G-quadruplexes in Spinach and some other RNAs both in vitro and in cells. However, the performance of BASH MaP and associated computational analysis in the context of other RNAs remains to be determined. 

      We thank the reviewer for their time spent analyzing this manuscript, for their positive assessment and for their suggestions on improving this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Although the text is clear and coherent, the overall flow of the manuscript comes across as "here's a bunch of stuff I tried." Maybe you're looking to get this out quickly, but it would have been much more impactful (and enjoyable to read) a description of a more polished final product. 

      Thank you for your highlighting the strengths and weaknesses of this manuscript. We have changed parts of the main text to enhance the overall flow of the manuscript and increase reader enjoyability.

      Reviewer #2 (Recommendations For The Authors): 

      I have only a few comments: 

      Major: 

      (1) Analysis of Guanosine Correlations in Figure 3E: In Figure 3E, there is a lot of variability in the correlations among guanosines. For example, G46 shows a strong correlation with G93 (within the same quadruplex) but also correlates with G91, G95 (in different quadruplexes), and G97 (not part of any quadruplex as per the model in Figure 3C). Contrarily, G86 exhibits weak correlations, and G50 along with G89 shows no significant correlations. These findings imply that BASH MaP followed by RING MaP analysis struggles to accurately distinguish between guanosines within the same or different quadruplexes in Spinach. Perhaps there are some opportunities to enhance the specificity in determining guanosine participation within quadruples, a great point for the authors to discuss. 

      Thank you for your comments and careful analysis of the pattern of correlations produced by BASH MaP. We agree that BASH MaP followed by RING MaP analysis is unable to unambiguously distinguish between guanosines within the same or different quadruplex layers. This finding was a surprise as we initially assumed that quadruplex layers would behave in a manner like Watson-Crick base pairs and produce specific signals in the corresponding RING MaP heatmaps.  We suspect that this may be due to mutations in specific G’s being associated with altered conformations which allow other G’s to form different interactions that affect DMS reactivity.  This may be unique to the highly complex structure in Spinach.  However, we think BASH-MaP clearly provides signals that point to key residues within the G-quadruplex, even if it does not clearly identify all of them.

      This idea is supported by experiments described in Figure 4, which show that mutation of a single guanosine residue causes a complete breakdown of the hydrogen-bonding network throughout all quadruplex layers. Additionally, DMS methylation of an N7G in a quadruplex is likely to disrupt base stacking interactions in and around the quadruplex. The compounding effects of a dynamic G-quadruplex and DMS-induced changes to local base stacking properties explains both the strong correlations with G97, which is base-stacked with the quadruplex, and the inability to specifically identify the guanosines which comprise specific quadruplex quartets. We have further emphasized this point in an updated discussion section.

      (2) Potential Consolidation of Figures 3 and 4: Figure 4 appears quite similar to Figure 3 but employs M2-seq instead of relying on spontaneous mutations. It might be beneficial to merge these figures to demonstrate that M2-seq can more effectively identify correlations between guanosines in quadruplexes. 

      We agree that Figures 3 and 4 appear quite similar but there is an important distinction to be made between RING MaP and M2-seq analysis. We suspect that the mechanism causing correlations between guanosines in quadruplexes for RING MaP as “RNA breathing” in contrast to the spontaneous T7 RNA polymerase-induced mutation model proposed in Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114. To determine whether correlations between guanosines in Spinach BASH MaP experiments rely on spontaneous mutations, we compared the fraction of reads containing misincorporations at pairs of quadruplex guanosines over a range of DMS concentrations.  The spontaneous mutation model predicts a linear dependence between quadruplex guanosine signals and DMS dose while an “RNA breathing” or double-DMS hit model predicts a quadratic dependence on DMS dose (Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114). Our data may support a quadratic dependence on DMS dose for multiple pairs of G-quadruplex guanosines, while they demonstrate a linear dependence between helical G’s (Supplementary Data Fig. 9). Together, these data suggest that BASH MaP followed by RING MaP analysis detects double-DMS modification events for pairs of quadruplex guanosines. Therefore, BASH MaP and RING MaP analysis provide a complimentary approach to M2 BASH MaP and reveal guanosine correlations in contexts where pre-installed mutations are incompatible such as the study of endogenously expressed RNAs.

      (3) Estimation of False Positive Rates: An estimation of the false positive rate for G-quadruplex identification would be invaluable. Since identification currently depends on the absence of DMS modification, it's important to consider how other factors like solvent inaccessibility or library generation might affect the detection and be misinterpreted as G-quadruplexes. Although this could be a subject of future work, some discussion by the authors would enhance the manuscript. 

      We have added a table summarizing sensitivity, positive predictive value, and false positive rate for different G-quadruplex identification schemes.  See Supplementary Table 1.

      Minor: 

      (4) Line 273 Reference Correction: Please adjust the reference in line 273 to accurately reflect that the G-quadruplex experiments compare potassium with lithium, not sodium. 

      In cellulo G-quadruplex reverse transcriptase (RT) stop assays as described by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) compared RT stops between DMS treated mRNA refolded in potassium and sodium buffers. We have clarified in the text that traditionally, G-quadruplex RT stop assays compare potassium with lithium.

      (5) Consistency in Figure 1 (Panels F and G): Aligning BASH MaP (170 mM DMS) as the y-axis in both panels F and G would visually align the data points and enhance the graphical coherence across these panels. 

      Thank you for noticing the subtleties in our data presentation and for the suggestion on how to improve our graphical coherence across panels. We specifically choose not to align BASH MaP (170 mM DMS) as the y-axis for panels F and G because we did not want the reader to mistakenly assume that the data for BASH MaP (170 mM DMS) presented in panels F and G is the same data. In panel F, BASH MaP was performed under standard DMS probing buffer conditions which utilized a pH 7.5 bicine buffer. The purpose of panel F is to show the reproducibility of BASH MaP under various DMS concentrations. In panel G, BASH MaP was performed under DMS probing buffer conditions which promote the formation of m3U using a pH 8.3 bicine buffer. The purpose of panel G is to show that the borohydride treatment and depurination steps in BASH MaP do not react with DMS-derived m1A, m3C, and m3U in a manner which prevents their measurement through cDNA misincorporation. Together, these experimental differences cause the data points for BASH MaP (170 mM DMS) to vary between panels F and G which would lead to more confusion for the reader and detract from the intended message we are trying to convey through panels F and G. 

      (6) Statistical Detail in Figure 1E: Incorporating a confidence interval or a P-value in Figure 1E would enrich the statistical depth and provide readers with a clearer understanding of the data's significance. 

      Thank you for the suggestion of including a p-value in Figure 1E to provide the readers with a clearer understanding of the data’s significance. The effect of combining DMS treatment and borohydride reduction on the misincorporation rate of G’s in Spinach is so dramatic that the raw data sufficiently provides the readers a clear understanding of its significance.

      (7) Reevaluation of Figure 2B: Considering the small number of Gs in single-stranded regions and base triples, it might be more informative to move Figure 2B to supplementary information. Focusing on Figure 2C, which consolidates non-quadruplex categories, could provide more impactful insights. 

      Thank you for your suggestion. It is important to initially provide an overall characterization of N7G DMS reactivity for G’s in a variety of structural contexts before more specifically looking at G-quadruplexes. Panel B is an important part of figure 2 for the following two reasons:

      First, a reader’s first question upon seeing the N7G chemical reactivity for Spinach as showed in Figure 2A is likely to ask whether base-paired G’s and single-stranded G’s have similar or different DMS reactivities. Figure 2, panel B shows that generally, single-stranded G’s appear to have higher DMS reactivity than base-paired G’s except for 2 G’s which display hyper-reactivity. The basis for this hyper-reactivity is addressed in Figure 4.

      Second, panel B highlights the wide range in N7G DMS reactivities. Since the G-quadruplex G’s display a dramatically lower DMS reactivity as compared to single-stranded G’s and hyper-reactive base-paired G’s, the dynamic range of DMS reactivities was difficult to capture in a single panel. Panel C does not convey these dynamics appropriately as a stand-alone figure.

      (8) Enhancements to Figure 2G: Improving the visibility of mutation rates in this figure would help. Suggestions include coloring bars by nucleotide type for intuitive visual comparison and adjusting the y-axis to a logarithmic scale to better represent near-zero mutation rates. Additionally, employing histograms or box plots could directly compare DMS reactivities and provide a clearer analysis. 

      Thank you for your suggestions on enhancing the presentation of BASH MaP applied to an mRNA. The main purpose of figure 2G was to validate whether BASH MaP could detect G’s engaged in a G-quadruplex in a cell. In-cell G-quadruplex folding measurements as performed by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) only identified a few G-quadruplexes which were folded and only the 3’ end of the G-quadruplex was detected. We therefore reasoned that the 3’ most G’s of these select set of G-quadruplexes were the only validated G’s engaged in a G-quadruplex in cells. In the instance of the AKT2 mRNA, Guo and Bartel found that 4 G’s appeared to be folded in a G-quadruplex in cells (Supplementary figure 2E). These G’s are indicated at the bottom of the plot with black bars and the label “In-cell G-quadruplex guanosines”. Therefore, we hypothesized that these G’s would display low DMS reactivity with BASH MaP while other G’s in the AKT2 mRNA would display higher chemical reactivities. We followed a standard convention in displaying chemical reactivities used extensively in the field where black bars indicate low reactivity, yellow bars indicate moderate reactivity, and red bars indicate high reactivity. The data in Fig 2G directly supports Guo and Bartel’s prediction of an in-cell folded G-quadruplex in the AKT2 mRNA because the 4 G’s predicted to be engaged in a G-quadruplex all displayed near zero DMS reactivities.

      We agree that adjusting the y-axis to a logarithmic scale would better represent near-zero mutations rates. However, the purpose of figure 2G is not to compare all positions with near-zero mutation rates. Instead, our use of standard conventions in displaying chemical reactivities is sufficient for the purpose of displaying BASH MaP’s ability to validate in-cell G-quadruplex G’s.

      Later in the paper, we go a step further and create a better criterion than simple N7G DMS reactivity for identifying G’s engaged in a G-quadruplex. For further analysis of G’s with near zero DMS reactivities, see Figure 3 and Supplementary figure 4 which utilizes RING Mapper to identify lowly-reactive G’s which produce co-occurring misincorporations.

      (9) Scale Consistency in Figure 3: Ensuring that the correlation scales are uniform across Panels A, B, D, and E would facilitate easier comparison of the data, enhancing the overall coherence of the findings. Using raw correlation values could also improve clarity and interpretation. 

      Thank you for the suggestions to facilitate easier comparisons of data in Figure 3. We have ensured the correlation scales are uniform across panels A, B, D, and E to enhance the coherence of these findings. We initially visualized the data in Figure 3 by plotting raw correlation values, but we found these values differed between DMS MaP and BASH MaP datasets, likely because of the low-level background mutations introduced by the borohydride reduction step of BASH (see Supplementary figure 3A). However, performing a global normalization of correlation strength values computed by RING mapper enabled clear comparisons between DMS MaP and BASH MaP RING heatmaps and revealed structural domains consistent with the crystal structure of Spinach.

      (10) Correction on Line 506: Please update the reference to M2 BASH MaP for accuracy. 

      Thank you. We have updated the main text to incorporate this comment.

      Reviewer #3 (Recommendations For The Authors): 

      The paper describes multiple applications and multiple methods of analysis of the BASH Map data, which collectively make the manuscript more difficult to follow. The manuscript would become more readable and user-friendly if there were some overview figures to describe the sequencing pipeline and the various computational workflows that the BASH MaP data are fed into (e.g. RING Mapper, DAGGER, M2 BASH MaP, Co-occurring Misincorporations, Secondary Structure Prediction). One or more summary schemes that provide an overview would strongly assist with the clarity and overall content of the paper. 

      Thank you for your suggestions. We have incorporated a summary scheme of the various computational workflows and their use cases in Fig 7.

      Line 165. Here, misincorporation rates for all four nucleotides are discussed, but m3U is not mentioned until from the following paragraph. It would be appropriate and clearer to mention this sooner. 

      Thank you for your suggestion. We have restructured this section to introduce the DMS modification m3U in an earlier paragraph to increase clarity for readers.

      Line 506: spelling of DAGGER. 

      Thank you. We have updated the main text to incorporate this comment.

      Line 645: I found this paragraph difficult to follow, especially the line starting 649. I thought the logic was to exclude G's involved in tertiary interactions from base-paring in the secondary structure prediction. Some clarification would be helpful. 

      Thank you for your comments. We have restructured the paragraph to emphasize that DAGGER only applies tertiary folding constraints to sequencing reads without misincorporations at G’s engaged in tertiary interactions. We reasoned that sequencing reads with a misincorporation at a G engaged in a tertiary interaction likely come from an RNA molecule which is in an alternative tertiary conformational state. In this specific circumstance, a tertiary folding constraint may impose incorrect restrictions on the folding of RNA molecules due to distinct tertiary conformations.

      Line 817. "Ability to". 

      Thank you. We have updated the main text to incorporate this comment.

      Figure 6F. Mistake in the axis description. 

      Thank you. We have updated the main text to incorporate this comment.

      Consider combining the paragraphs at lines 850 and 903. 

      Thank you for the suggestion. We rearranged paragraphs in the discussion to improve clarity.

      Line 1546. The final conc of DMS would be nice to see here.

      Thank you. We have updated the main text to incorporate this comment.

    2. eLife Assessment

      This important work substantially advances our understanding of RNA structure analysis by introducing an innovative method that extends DMS probing to include guanosine residues, thereby enhancing our ability to detect complex tertiary interactions. The evidence supporting the conclusions is compelling, with detailed analyses demonstrating the method's capacity to differentiate structural contexts and improve RNA structure predictions. This work will be of broad interest to RNA structural biology, biochemistry, and biophysics researchers.

    3. Reviewer #1 (Public review):

      Summary:

      DMS-MaP is a sequencing-based method for assessing RNA folding by detecting methyl adducts on unpaired A and C residues created by treatment with dimethylsulfate (DMS). DMS also creates methyl adducts on the N7 position of G, which could be sensitive to tertiary interactions with that atom, but N7-methyl adducts cannot be detected directly by sequencing. In this work, the authors adopt a previously developed method for converting N7-methyl-G to an abasic site to make it detectable by sequencing and then show that the ability of DMS to form an N7-methyl-G adduct is sensitive to RNA structural context. In particular, they look at the G-quadruplex structure motif, which is dense with N7-G interactions, is biologically important, and lacks conclusive methods for in-cell structural analysis.

      Strengths:

      - The authors clearly show that established methods for detecting N7-methyl-G adducts can be used to detect those adducts from DMS and that the formation of those adducts is sensitive to structural context, particularly G-quadruplexes.

      - The authors assess the N7-methyl-G signal through a wide range of useful probing analyses, including standard folding, adduct correlations, mutate-and-map, and single-read clustering.

      - The authors show encouraging preliminary results toward the detection of G-quadruplexes in cells using their method. Reliable detection of RNA G-quadruplexes in cells is a major limitation for the field and this result could lead to a significant advance.

      - Overall, the work shows convincingly that N7-methyl-G adducts from DMS provide valuable structural information and that established data analyses can be adapted to incorporate the information.

      Weaknesses:

      - Most of the validation work is done on the spinach aptamer and it and polyUG RNA are the only RNAs tested that have a known 3D structure. Although it is a useful model for validating this method, it does not provide a comprehensive view of what results to expect across varied RNA structures.

      - It's not clear from this work what the predictive power of BASH-MaP would be when trying to identify G-quadruplexes in RNA sequences of unknown structure. Although clusters of G's with low reactivity and correlated mutations seem to be a strong signal for G-quadruplexes, no effort was made to test a range of G-rich sequences that are known to form G-quadruplexes or not. Having this information would be critical for assessing the ability of BASH-MaP to identify G-quadruplexes in cells.

      - Although the authors present interesting results from various types of analysis, the code currently available on Github lacks the documentation and examples necessary to be useful to the broader community.

      - There are aspects of the DAGGER analysis that could limit its robustness or utility for different RNAs:

      (1) Folding of the RNA based on individual reads does not represent single-molecule folding since each read contains only a small fraction of the possible adducts that could have formed on that molecule. As a result, each fold will largely be driven by the naive folding algorithm. The DANCE-MaP algorithm that was also used by the authors addresses this concern.<br /> (2) G residues in a loop will have a different impact on RNA folding than those in a G-quadruplex. This difference could reduce the accuracy of CONTRAfold predictions when forcing G-quadruplex residues to be unpaired. That said, predicting secondary structure around G-quadruplexes is a challenge for folding algorithms.<br /> (3) Incorporation of the G mutations requires prior knowledge of the RNA 3D structure, limiting the utility of the method to predicting alternative conformations in structures that are already well characterized.

    4. Reviewer #3 (Public review):

      Summary:

      In this study the authors aim to develop an experimental/computational pipeline to assess the modification status of an RNA following treatment with dimethylsulfate (DMS). Building upon the more common DMS Map method, which predominantly assesses the modification status of the Watson-Crick-Franklin face of A's and C's, the authors insert a chemical processing step in the workflow prior to deep sequencing that enables detection of methylation at the N7 position of guanosine residues. This approach, termed BASH MaP, provides a more complete assessment of the true modification status of an RNA following DMS treatment, and this new information provides a powerful set of constraints for assessing the secondary structure and conformational state of an RNA. In developing this work, the authors use Spinach as a model RNA. Spinach is a fluorogenic RNA that binds and activates the fluorescence of a small molecule ligand. Crystal structures of this RNA with ligand bound show that it contains a G-quadruplex motif. In applying BASH MaP to Spinach, the authors also perform the more standard DMS MaP for comparison. They show that the BASH MaP workflow appears to retain the information yielded by DMS MaP while providing new information about guanosine modifications. In Spinach, the G-quadruplex G's have the least reactive N7 positions, consistent with the engagement of N7 in hydrogen bonding interactions at G's involved in quadruplex formation. Moreover, because the inclusion of data corresponding to G increases the number of misincorporations per transcript, BASH MaP is more amenable to analysis of co-occurring misincorporations through statistical analysis, especially in combination with site-specific mutations. These co-occurring misincorporations provide information regarding what nucleotides are structurally coupled within an RNA conformation. By deploying a likelihood-ratio statistical test on BASH MaP data, the authors can identify Gs in G-quadruplexes, deconvolute G-G correlation networks, base-triple interactions and even stacking interactions. Further, the authors develop a pipeline to use the BASH MaP-derived G-modification data to assist in the prediction of RNA secondary structure and identify alternative conformations adopted by a particular RNA. This seems to help with the prediction of secondary structure for Spinach RNA.

      Strengths:

      The BASH Map procedure and downstream data analysis pipeline more fully identifies the complement of methylations to be identified from DMS treatment of RNA, thereby enriching the information content. This in turn allows for more robust computational/statistical analysis, which likely will lead to more accurate structure predictions. This seems to be the case for the Spinach RNA.

      Weaknesses:

      The authors demonstrate that their method can detect G-quadruplexes in Spinach and some other RNAs both in vitro and in cells. While application to other RNAs is beyond the scope of the current manuscript, the performance of BASH MaP and associated computational analysis in the context of other RNAs remains to be determined.

    1. eLife Assessment

      This study by Graca et al. explores ethanol metabolism pathways in mycobacteria. The enzyme, MftG, a flavoprotein dehydrogenase, is shown to act as an electron shuttle between an uncommon redox cofactor and the electron transport chain thereby regenerating mycofactocin. Whilst this study was conducted in Mycobacterium smegmatis, the findings are important and have general implications for elucidating broader mycobacterial metabolism. Overall, the data presented are convincing supported by well-designed experiments.

    2. Reviewer #1 (Public review):

      Using a knock-out mutant strain, the authors tried to decipher the role of the last gene in the mycofactocin operon, mftG. They found that MftG was essential for growth in the presence of ethanol as the sole carbon source, but not for the metabolism of ethanol, evidenced by the equal production of acetaldehyde in the mutant and wild type strains when grown with ethanol (Fig 3). The phenotypic characterization of ΔmftG cells revealed a growth-arrest phenotype in ethanol, reminiscent of starvation conditions (Fig 4). Investigation of cofactor metabolism revealed that MftG was not required to maintain redox balance via NADH/NAD+, but was important for energy production (ATP) in ethanol. Since mycobacteria cannot grow via substrate-level phosphorylation alone, this pointed to a role of MftG in respiration during ethanol metabolism. The accumulation of reduced mycofactocin points to impaired cofactor cycling in the absence of MftG, which would impact the availability of reducing equivalents to feed into the electron transport chain for respiration (Fig 5). This was confirmed when looking at oxygen consumption in membrane preparations from the mutant and would type strains with reduced mycofactocin electron donors (Fig 7). The transcriptional analysis supported the starvation phenotype, as well as perturbations in energy metabolism, and may be beneficial if described prior to respiratory activity data.<br /> The data and conclusions support the role of MftG in ethanol metabolism.

    3. Reviewer #3 (Public review):

      Summary:

      The work by Graca et al. describes a GMC flavoprotein dehydrogenase (MftG) in the ethanol metabolism of mycobacteria and provides evidence that it shuttles electrons from the mycofactocin redox cofactor to the electron transport chain.

      Strengths:

      Overall, this study is compelling, exceptionally well designed and thoroughly conducted. An impressively diverse set of different experimental approaches is combined to pin down the role of this enzyme and scrutinize the effects of its presence or absence in mycobacteria cells growing on ethanol and other substrates. Other strengths of this work are the clear writing style and stellar data presentation in the figures, which makes it easy also for non-experts to follow the logic of the paper. Overall, this work therefore closes an important gap in our understanding of ethanol oxidation in mycobacteria, with possible implications for the future treatment of bacterial infections.

      Weaknesses:

      I see no major weaknesses of this work, which in my opinion leaves no doubt about the role of MftG.

    4. Reviewer #4 (Public review):

      Summary:

      The manuscript by Graça et al. explores the role of MftG in the ethanol metabolism of mycobacteria. The authors hypothesise that MftG functions as a mycofactocin dehydrogenase, regenerating mycofactocin by shuttling electrons to the respiratory chain of mycobacteria. Although the study primarily uses M. smegmatis as a model microorganism, the findings have more general implications for understanding mycobacterial metabolism. Identifying the specific partner to which MftG transfers its electrons within the respiratory chain of mycobacteria would be an important next step, as pointed out by the authors.

      Strengths:

      The authors have used a wide range of tools to support their hypothesis, including co-occurrence analyses, gene knockout and complementation experiments, as well as biochemical assays and transcriptomics studies.<br /> An interesting observation that the mftG deletion mutant grown on ethanol as the sole carbon source exhibited a growth defect resembling a starvation phenotype.<br /> MftG was shown to catalyse the electron transfer from mycofactocinol to components of the respiratory chain, highlighting the flexibility and complexity of mycobacterial redox metabolism.

      Weaknesses:

      Could the authors elaborate more on the differences between the WT strains in Fig. 3C and 3E? in Fig. 3C, the ethanol concentration for the WT strain is similar to that of WT-mftG and ∆mftG-mftG, whereas the acetate concentration in thw WT strain differs significantly from the other two strains. How this observation relates to ethanol oxidation, as indicated on page 12.<br /> The authors conclude from their functional assays that MftG catalyses single-turnover reactions, likely using FAD present in the active site as an electron acceptor. While this is plausible, the current experimental set up doesn't fully support this conclusions, and the language around this claim should be softened.<br /> The authors suggest in the manuscript that the quinone pool (page 24) may act as the electron acceptor from mycofactocinol, but later in in the discussion section (page 30) they propose cytochromes as the potential recipients. If the authors consider both possibilities valid, I suggest discussing both options in the manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using a knock-out mutant strain, the authors tried to decipher the role of the last gene in the mycofactocin operon, mftG. They found that MftG was essential for growth in the presence of ethanol as the sole carbon source, but not for the metabolism of ethanol, evidenced by the equal production of acetaldehyde in the mutant and wild type strains when grown with ethanol (Fig 3). The phenotypic characterization of ΔmftG cells revealed a growth-arrest phenotype in ethanol, reminiscent of starvation conditions (Fig 4). Investigation of cofactor metabolism revealed that MftG was not required to maintain redox balance via NADH/NAD+, but was important for energy production (ATP) in ethanol. Since mycobacteria cannot grow via substrate-level phosphorylation alone, this pointed to a role of MftG in respiration during ethanol metabolism. The accumulation of reduced mycofactocin points to impaired cofactor cycling in the absence of MftG, which would impact the availability of reducing equivalents to feed into the electron transport chain for respiration (Fig 5). This was confirmed when looking at oxygen consumption in membrane preparations from the mutant and would type strains with reduced mycofactocin electron donors (Fig 7). The transcriptional analysis supported the starvation phenotype, as well as perturbations in energy metabolism, and may be beneficial if described prior to respiratory activity data.

      We thank the reviewer for their thorough evaluation of our work. We carefully considered whether transcriptional data should be presented before the respirometry data. However, this would disrupt other transitions and the flow of thoughts between sections, so that we prefer to keep the order of sections as is.

      While the data and conclusions do support the role of MftG in ethanol metabolism, the title of the publication may be misleading as the mutant was able to grow in the presence of other alcohols (Supp Fig S2).

      We agree that ethanol metabolism was the focus of this work and that phenotypes connected to other alcohols were less striking. We, therefore, changed “alcohol” to “ethanol” in the title of the manuscript.

      Furthermore, the authors propose that MftG could not be involved in acetate assimilation based on the detection of acetate in the supernatant and the ability to grow in the presence of acetate. The minimal amount of acetate detected in the supernatant but a comparative amount of acetaldehyde could point to disruption of an aldehyde dehydrogenase.

      We do not agree that MftG might be involved in acetaldehyde oxidation. According to our hypothesis, the disruption of an acetaldehyde dehydrogenase would lead to the accumulation of acetaldehyde. However, we observed an equal amount of acetaldehyde in cultures of M. smegmatis WT and ∆mftG grown on ethanol as well as on ethanol + glucose. Furthermore, the amount of acetate detected in the supernatants is not “minimal” as the reviewer points out but higher as or comparable to the acetaldehyde concentration (Figure 3 E and F, note that acetate concentration are indicated in g/L, acetaldehyde concentrations in µM). Furthermore, the accumulation of mycofactocinols in ∆mftG mutants grown on ethanol is not in agreement with the idea of MftG being an aldehyde dehydrogenase but very well supports our hypothesis that MftG is involved in cofactor reoxidation.

      The link between mycofactocin oxidation and respiration is shown, however the mutant has an intact respiratory chain in the presence of ethanol (oxygen consumption with NADH and succinate in Fig 7C) and the NADH/NAD+ ratios are comparable to growth in glucose. Could the lack of growth of the mutant in ethanol be linked to factors other than respiration?

      Indeed, by using NADH and succinate as electron donors we show that the respiratory chain is largely intact in WT and ∆mftG grown on ethanol. Also, when mycofactocinols were used as an electron donor, we observed that respiration was comparable to succinate respiration in the WT. However, respiration was severely hampered in membranes of ∆mftG when mycofactocinols were offered as reducing agent. These findings support our hypothesis very well that MftG is necessary to shuttle electrons from mycofactocin to the respiratory chain, while the rest of the respiratory chain stayed intact. The fact that NADH/NAD+ ratios are comparable between ethanol and glucose conditions are interesting but indirectly support our hypothesis that mycofactocin and not NAD is the major cofactor in ethanol metabolism. Therefore, we do not see any evidence that the lack of growth of the mutant in ethanol is linked to factors other than respiration.

      To this end, bioinformatic investigation or other evidence to identify the membrane-bound respiratory partner would strengthen the conclusions.

      We generally agree that it is an important next step to identify the direct interaction partners of MftG. However, we are convinced that experimental evidence using several orthogonal approaches is required to unequivocally identify interaction partners of MftG. Nevertheless, we agree that a preliminary bioinformatics study, could guide follow-up studies. We therefore attempted to predict interaction partners of MftG using D-SCRIPT and Alphafold 2. However, our approach did not reveal any meaningful results. Thus, we prefer not to integrate this approach into the manuscript but briefly summarize our methodology here: To predict potential interaction partners of M. smegmatis mc2 155 MftG (MSMEG_1428), D-SCRIPT (Sledzieski et al. 2021, https://doi.org/10.1016/j.cels.2021.08.010) with the Topsy-Turvy model version 1 (Singh et al. 2022, https://doi.org/10.1093/bioinformatics/btac258) was employed to screen every combination of the MSMEG_1428 amino acid sequence with the amino acid sequence of every potential interaction partner from the M. smegmatis mc2 155 predicted total proteome (total 6602 combinations, UniProt UP000000757,  Genome Accession CP000480). Predictions failed for eight potential interaction partners due to size constraints (MSMEG_0019, MSMEG_0400, MSMEG_0402, MSMEG_0408, MSMEG_1252, MSMEG_3715, MSMEG_4727, MSMEG_4757; all amino acids sequences ≥ 2000 AA). Afterward, the top 100 predicted interaction partners, ranked by D-SCRIPT protein-protein-interaction score, were subjected to an Alphafold 2 multimer prediction using ColabFold batch version 1.5.5 (AlphaFold 2 with MMseqs2, Mirdita et al. 2022, https://doi.org/10.1038/s41592-022-01488-1) on a Google Colab T4 GPU with a Python 3 environment and the following parameters (msa_mode: MMseqs2 (UniRef+Environmental), num_models = 1, num_recycles = 3, stop_at_score = 100, num_relax = 0, relax_max_iterations = 200, use_templates = False). As input, the MSMEG_1428 amino acid sequence was used as protein 1 and the amino acid sequence of the potential interaction partner was used as protein 2. In addition, proteins of the electron transport chain and the dormancy regulon (dos regulon) were included as potential interaction partners. In total, 222 unique potential MftG interactions were predicted. The AlphaFold 2 model interface predicted template modelling (ipTM) score peaked at 0.45 for MftG-MftA. This score, however, lies below the threshold of 0.75, which indicates a likely false prediction of interaction (Yin et al. 2022, https://doi.org/10.1002/pro.4379). Nonetheless, the models with the highest ipTM scores (MftG with MftA, MSMEG_3233, MSMEG_4260, MSMEG_0419, MSMEG_5139, MSMEG_5140) were inspected manually using ChimeraX version 1.8 (Meng et al. 2023, https://doi.org/10.1002/pro.4792). However, no reasonable interaction was found.

      Reviewer #2 (Public Review):

      Summary

      Patrícia Graça et al., examined the role of the putative oxidoreductase MftG in regeneration of redox cofactors from the mycofactocin family in Mycolicibacerium smegmatis. The authors show that the mftG is often co-encoded with genes from the mycofactocin synthesis pathway in M. smegmatis genomes. Using a mftG deletion mutant, the authors show that mftG is critical for growth when ethanol is the only available carbon source, and this phenotype can be complemented in trans. The authors demonstrate the ethanol associated growth defect is not due to ethanol induced cell death, but is likely a result of carbon starvation, which was supported by multiple lines of evidence (imaging, transcriptomics, ATP/ADP measurement and respirometry using whole cells and cell membranes). The authors next used LC-MS to show that the mftG deletion mutant has much lower oxidised mycofactocin (MFFT-8 vs MMFT-8H2) compared to WT, suggesting an impaired ability to regenerate myofactocin redox cofactors during ethanol metabolism. These striking results were further supported by mycofactocin oxidation assays after over-expression of MftG in the native host, but also with recombinantly produced partially purified MftG from E. coli. The results showed that MftG is able to partially oxidise mycofactocin species, finally respirometry measurements with M. smegmatis membrane preparations from WT and mftG mutant cells show that the activity of MftG is indispensable for coupling of mycofactocin electron transfer to the respiratory chain. Overall, I find this study to be comprehensive and the conclusions of the paper are well supported by multiple complementary lines of evidence that are clearly presented.

      Strengths

      The major strengths of the paper are that it is clearly written and presented and contains multiple, complementary lines of experimental evidence that support the hypothesis that MftG is involved in the regeneration of mycofactocin cofactors, and assists with coupling of electrons derived from ethanol metabolism to the aerobic respiratory chain. The data appear to support the authors hypotheses.

      We thank the reviewer for their thorough evaluation of our work.

      Weaknesses

      No major weaknesses were identified, only minor weaknesses mostly surrounding presentation of data in some figures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig 6 C and D, would it not be expected that MMFT-2H2 would be decreasing over time as MMFT-2 is increasing?

      This is true. MMFT-2H2 is indeed decreasing while MMFT-2 in increasing, however, since the y-axis is drawn in logarithmic scale the visible difference is not proportional to the actual changes. The increase of MMFT-2 against a very low starting point is more clearly visible than the decrease of MMFT-2H2, which was added in high quantities.

      (2) It would be beneficial to include rationale regarding the electron acceptors tested and why FAD was not included.

      FAD is a prosthetic group of the enzyme and was always a component of the assay. The other electron acceptors were chosen as potential external electron acceptors.

      (3) Bioinformatic analysis to capture possible interacting partners of MftG

      See our response to the previous review.

      Reviewer #2 (Recommendations For The Authors):

      Questions:

      (1) The co-occurrence analysis showed that one genome encoded mftG, but not mftC - do the authors think that this is a mftG mis-annotation?

      This is a good question. We have investigated this case more closely and conclude that this particular mftG is not a misannotation. Instead, it appears that the mftC gene underwent gene loss in this organism. We added on page 8, line 15: “Only one genome (Herbiconiux sp. L3-i23) encoding a bona fide MftG did not harbor any MftC homolog. However, close inspection revealed the presence of mftD, mftF, and a potential mftA gene but a loss of mftB,C and E in this organism.”

      (2) Figure 3A - the complemented mutant strain shows enhanced growth on ethanol when compared to the WT strain with the same mftG complementation vector, suggesting that dysregulation from the expression plasmid may not be responsible for this phenotype. Have the authors conducted whole genome sequencing on the mutant/complement isolate to rule out secondary mutations?

      This is an interesting point. We have not conducted further investigations into the complement mutant. However, we can confidently state that the complementation was successful in that it restored growth of the ∆mftG mutant on ethanol, thus confirming that the growth arrest of the mutant was due to the lack of mftG activity and not due to any secondary mutation. We also observed that both the complement strain and the overexpression strain, both of which are based on the same overexpression plasmid, exhibited shorter lag phases, faster growth and higher final cell densities compared to the wild type. We interpret these data in a way that overexpression of mftG might lift a growth limited step. Notably, this is only an interpretation, we do not make this claim. What we cannot explain at the moment, is the observation that the complement mutant grew to a higher OD than the overexpression strain. This is indeed interesting, and it might be due to an artefact or due to complex regulatory effects, which are hard to study without an in-depth characterization of the different strains involved. While this goes beyond the scope of this study, we are convinced that our main conclusions are not challenged by this phenomenon.

      (3) Figure 4C - could the yellow fluorescence that suggests growth arrest be quantified in these images similar to the size and septa/replication sites?

      In principle, this is a good suggestion. However, the amount of yellow fluorescence only differed in the starvation condition between genotypes. Since this condition was not a focus of this study, we preferred not to discuss these differences further.

      (4) Figure 4E - the complemented mutant strain has very high error, why is that? Could this phenotype not be complemented?

      It is true that the standard deviation (SD) is relatively high in this experiment. This is due to the fact that single-cell analyses based on microscopic images were conducted here - not bulk measurements of the average fluorescence. This means that the high variance partially reflects phenotypic heterogeneity of the population, rather than inefficient complementation. While it is interesting that not all cells behaved equally, a finding that deserves further investigations in the future, we conclude that the mean value is a good representative for the efficiency of the complementation.

      (5) While the whole cell extract experiment presented in Figure 6A is very clear, could the authors include SDS page or MS results of their partially purified MftG preparations used for figure B-F in the supplementary data to rule out any confounding factors that may be oxidising mycofactocin species in these preparations?

      We did not include SDS-Page or MS results since the enzyme preparations obtained were not pure. This is why we refer to the preparation as “partially purified fraction”. Since we were aware of the risk of confounding factors being potentially present in the preparation, we used two different expression hosts (M. smegmatismftG and E. coli) and included negative controls, i.e., a reaction using protein preparations from the same host that underwent the exact same purification steps but lacked the mftG gene. For instance, Figure 6A shows the negative control (M. smegmatismftG) and the verum (M. smegmatismftG-mftG_His6). Although this control is not shown in panels BCD for more clarity, we can assure that the proposed activity of MftG as never been detected in any extract of _M. smegmatismftG. Concerning MftG preparations obtained from heterologous expression in E. coli, we also performed empty vector controls and inactivated protein controls. We added a new Supplementary Figure S4 to show one example control. Taken together, the usage of two different expression hosts along with corresponding background controls clearly demonstrates that mycofactocinol oxidation only occurred in protein extracts of bacterial strains that contained the mftG gene. Taken together, these data indicate that the observed mycofactocinol dehydrogenase activity is connected to MftG and not to any background activity.

      Recommendations:

      • A suggestion - revise sub-titles in the results section to be more 'results-oriented' e.g. rather than 'the role of MftG in growth and metabolism of mycobacteria' consider instead 'MftG is critical for M. smegmatis capacity to utilise ethanol as a sole carbon source for growth' or something similar.

      In principle this is a good idea for many manuscripts. However, we have the impression that this approach does not reflect the complexity and additive aspect of the sections of our manuscript.

      • For clarity, revise all figures to include p-values in the figure legend rather than above the figures (use asterisks to indicate significance).

      We are not sure whether the deletion of p-values in the figures would enhance clarity. We would prefer to leave them within figures.

      • Figure 5B -revise colour legend, it is unclear which bar on the graph corresponds to which strain.

      The figure legend was enlarged to enhance readability.

      • Page 8 - MftG and MftC should be lowercase and italicised as the authors are writing about the co-occurrence of genes encoded in genomes, not proteins.

      Good point, we changed some instances of MftG / MftC to mftG / mftC, to more specifically refer to the gene level. However, in some cases, the protein level is more appropriate, for instance, the phylogenies are based on protein sequences. That is why we used the spelling MftG / MftC in these cases.

      • Page 9 - for clarity move Figure 3 after first in text citation.

      We moved Figure 3.

      • Page 17 - for clarity move Figure 5 after first in text citation.

      We moved Figure 5. We furthermore reformatted figure legend to fit onto the same page as the figures.

      • Page 20, line 17 - 'was attempted' change to 'was performed'. The authors did more than attempt purification, they succeeded!

      Since purification of MftG was not successful, we prefer the term “attempted” here. However, activity assays indeed indicate successful production of MftG.

      • Page 20, line 19-21 - data showing that the MftG-HIS6 complements ∆mftG could be included in supplementary information.

      Complementation was obvious by growth on media containing ethanol as a sole carbon source.

      • Page 26 line 25 - 'we also we' delete duplicated we.

      Thank you for the hint, we deleted the second instance of “we” in the manuscript.

      • Page 26 Line 26 - 'mycofactocinols were oxidised to mycofactocinols', should this read mycofactocinols were oxidised to mycofactocinones?

      Correct. We changed “mycofactocinols” to “mycofactocinones”

      • Page 28 line 17, huc hydrogenase operon

      We added (“huc operon”).

      • Page 38 line 24, 'Two' not 'to'.

      This is a misunderstanding. “To” is correct

    1. eLife Assessment

      This important, clearly written, and timely manuscript links the timing of ART with the kinetics of total and intact proviral HIV DNA. The conclusions are interesting and somewhat novel, and the importance of the work is high because the focus is on African women and clade C virus, both of which are understudied in the HIV reservoir field. The strength of the evidence is convincing. Overall, this work will be of very high interest to scientists and clinicians in the HIV cure/persistence fields.

    2. Reviewer #1 (Public review):

      The authors sought to determine the impact of early antiretroviral treatment on the size, composition, and decay of the HIV latent reservoir. This reservoir represents the source of viral rebound upon treatment interruption and therefore constitutes the greatest challenge to achieving an HIV cure. A particular strength of this study is that it reports on reservoir characteristics in African women, a significantly understudied population, of whom some have initiated treatment within days of acute HIV diagnosis. With the use of highly sensitive and current technologies, including digital droplet PCR and near full-length genome next-generation sequencing, the authors generated a valuable dataset for investigation of proviral dynamics in women initiating early treatment compared to those initiating treatment in chronic infection. The authors confirm previous reports that early antiretroviral treatment restricts reservoir size, but further show that this restriction extends to defective viral genomes, where late treatment initiation was associated with a greater frequency of defective genomes. Furthermore, an additional strength of this study is the longitudinal comparison of viral dynamics post-treatment, wherein early treatment was shown to be associated with a more rapid rate of decay in proviral genomes, regardless of intactness, over a period of one year post-treatment. While it is indicated that intact genomes were not detected after one year following early treatment initiation, sampling depth is noted as a limitation of the study by the authors, and caution should thus be taken with interpretation where sequence numbers are low. Defective genomes are more abundant than intact genomes and are therefore more likely to be sampled. Early treatment was also associated with reduced proviral diversity and fewer instances of polymorphisms associated with cytotoxic T-lymphocyte immune selection. This is expected given that rapid evolution and extensive immune selection are synonymous with HIV infection in the absence of treatment, yet points to an additional benefit of early treatment in the context of immune therapies to restrict the reservoir.

      This is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C. The data and findings from this study therefore represent a much-needed resource in furthering our understanding of HIV persistence and informing broadly impactful cure strategies. The analysis on clonal expansion of proviral genomes may be limited by higher sequence homogeneity in hyperacute infection i.e., cells with different proviral integration sites may have a higher likelihood of containing identical genomes compared to chronic infection.

      Overall, these data demonstrate the distinct benefits of early treatment initiation at reducing the barrier to a functional cure for HIV, not only by restricting viral abundance and diversity but also potentially through the preservation of immune function and limiting immune escape. It therefore provides clues to curative strategies even in settings where early diagnosis and treatment may be unlikely.

    3. Reviewer #2 (Public review):

      HIV infection is characterized by viral integration into permissive host cells - an event that occurs very early in viral-host encounter. This constitutes the HIV proviral reservoir and is a feature of HIV infection that provides the greatest challenge for eradicating HIV-1 infection once an individual is infected.

      This study looks at how starting HIV treatment very early after infection, which substantially reduces the peak viral load detectable (compared to untreated infection), affects the amount and characteristics of the viral reservoir. The authors studied 35 women in South Africa who were at high risk of getting HIV. Some of these women started HIV treatment very soon after getting infected, while others started later. This study is well designed and has as its focus a very well characterized cohort. Comparison groups are appropriately selected to address proviral DNA characterization and dynamics in the context of acute and chronic treated HIV-1. The amount of HIV and various characteristics of the genetic makeup of the virus (intact/defective proviral genome) was evaluated over one year of treatment. Methods employed for proviral DNA characterization are state of the art and provide in-depth insights into the reservoir in peripheral blood.

      While starting treatment early didn't reduce the amount of HIV DNA at the outset, it did lead to a gradual decrease in total HIV DNA quantity over time. In contrast, those who started treatment later didn't see much change in this parameter. Starting treatment early led to a faster decrease in intact provirus (a measure of replication-competence), compared to starting treatment later. Additionally, early treatment reduced genetic diversity of the viral DNA and resulted in fewer immune escape variants within intact genomes. This suggests that collectively having a smaller intact replication-competent reservoir, less viral variability, and less opportunity for virus to evade the immune system - are all features that are likely to facilitate more effective clearance of viral reservoir, especially when combined with other intervention strategies.

      Major strengths of the study include the cohort of very early treated persons with HIV and the depth of study. These are important findings, particularly as the study was conducted in HIV-1 subtype C infected women (more cure studies have focussed on men and with subtype B infection)- and in populations most affected by HIV and in need of HIV cure interventions. This is highly relevant because it cannot be assumed that any interventions employed for reducing/clearing the HIV reservoir would perform similarly in men and women or across different populations. Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.  

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult.  However, we can cite/highlight and contrast our study with a few a few examples from acute infection studies as follows.

      a. Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year.  In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      b. White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      c. A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy.  Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      d. A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated.  In acute treated 93% (48% in FRESH) were defective and 35% (7% in FRESH) were hypermutated.  The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups where FRESH participants initiate ART at a median of 1 day after infection.  It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.  

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1

      subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      e. In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV.  Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective.  These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      (2) Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size. 

      We have edited the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly (lines 443-444). We have now performed an analysis of area under the curve to compare viral burden in the two study groups and found associations with proviral DNA levels after one year. This has been added to the results section (lines 162-163).

      Reviewer #2 (Public reviews):

      (1) Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3 (Public reviews):

      (1) The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and where appropriate have amended the use of the word reservoir to only refer to the proviral load after full viral suppression, i.e., undetectable viral load.

      (2) All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties on request. We have now added a section on data availability (lines 489-491).

      (3) The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points.  This has now been explained more clearly in the figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) It is recommended that the introduction includes information to set the scene regarding what is currently reported on the composition of the reservoir for those not in the immediate field of study i.e., the reported percentage of defective genomes and in which settings/populations genome intactness has been mapped, as this remains an area of limited information.

      We have now included summary of other reported findings in the field in the introduction (lines 89-92, 9498) and discussion (lines 345-350).  A more detailed overview has been provided in the response to public reviews.

      (2) It may be beneficial to state in the main text of the paper what the purpose of the Raltegravir was and that it was only administered post-suppression. Looking at Table 1, only the hyperacute treatment group received Raltegravir and this could be seen as a confounder as it is an integrase inhibitor. Therefore, this should be explained.

      Once Raltegravir became available in South Africa, all new acute infections in the study cohort had an intensified 4-drug regimen that included Raltegravir.  A more detailed explanation has now been included in the methods section (lines 435-437).

      (3) Can the authors explain why the viral measures at 6 months post-ART are not shown for chronictreated individuals in Figure 1 or reported on in the text?

      The 6 months post-ART time point has been added to Figure 1.

      (4) Can the authors indicate in the discussion, how the breakdown of proviral composition compares to subtype B as reported in the literature, for example, are the common sites of deletion similar, or is the frequency of hypermutation similar?

      Added to discussion (lines 345-350).

      (5) Do the numbers above the bars in Figure 3 represent the number of sampled genomes? If so, this should be stated.

      Yes, the numbers above the bars represent the number of sampled genomes. This has been added to the Figure 3 legend.

      (6) In the section starting on line 141, the introduction implies a comparison with immunological features, yet what is being compared are markers of clinical disease progression rather than immune responses. This should be clarified/corrected.

      This has been corrected (line 153).

      (7) Line 170 uses the term 'immediately' following infection, however, was this not 1 -3 days after?

      We have changed the word “immediately” to “1-3 days post-detection” (line 181).

      (8) Can the sampling time-points for the two groups be given for the longitudinal sequencing analysis?

      The sequencing time points for each group is depicted in Figure 2.

      (9) Line 183 indicates that intact genomes contributed 65% of the total sequence pool, yet it's given as 35% in the paragraph above. Should this be defective genomes?

      Yes, this was a typographical error.  Now corrected to read “defective genomes” (line 193).

      (10) The section on decay kinetics of intact and defective genomes seems to overlap with the section above and would flow better if merged.

      Well noted, however we choose to keep these sections separate.

      (11) Some references in the text are given in writing instead of numbering.

      This has been corrected.

      (12) In the clonal expansion results section, can it be indicated between which two time-points expansion was measured?

      This analysis was performed with all sequences available for each participant at all time points.  We have added this explanation to the respective Figure legend.

      Reviewer #2 (Recommendations for The Authors):

      (1) The statement on line 384 "Our data showed that early ART...preserves innate immune factors" - what innate immune factors are being referred to?

      We have removed this statement.

      (2) HLA genotyping methods are not included in the Methods section

      Now included and referenced (lines 481-483).

      (3) Are CD4:CD8 ratios available for the cohorts? This could be another informative clinical parameter to analyse in relation to HIV-1 proviral load after 1 year of ART – as done for the other variables (peak VL, and the CD4 measures).

      Yes, CD4:CD8 ratios are available. We performed the recommended analysis but found no associations with HIV-1 proviral load after 1 year of ART. We have added this to the results section (lines 163-164).

      (4) Reference formatting: Paragraph starting at line 247 (Contribution of clonal expansion...) - the two references in this paragraph are not cited according to the numbering system as for the rest of the manuscript. The Lui et al, 2020 reference is missing from the reference list - so will change all the numbering throughout.

      This has been corrected.

      Reviewer #3 (Recommendations for The Authors):

      (1) To allow comparison to past work. I suggest changing decay using % to half-life. I would also mention the multiple studies looking at total and intact HIV DNA decay rates in the intro.

      We do not have enough data points to get a good estimate of the half-life and therefor report decay as percentage per month for the first 6 months. 

      (2) Line 73: variability is the wrong word as inter-individual variability is remarkably low. I think the authors mean "difference" between intact and total.

      We have changed the word variability to difference as suggested.

      (3) Line 297: I am personally not convinced that there is data that definitively shows total HIV DNA impacting the pathophysiology of infection. All of this work is deeply confounded by the impact of past viremia. The authors should talk about this in more detail or eliminate this sentence.

      We have reworded the statement to read “Total HIV-1 DNA is an important biomarker of clinical outcomes.” (Lines 308-309).

      (4) Line 317; There is no target cell limitation for reservoir cells. The vast majority of CD4+ T cells during suppressive ART are uninfected. The mechanism listing the number of reservoir cells is necessarily not target cell limitation.

      We agree. The statement this refers to has been reworded as follows: “Considering, that the majority of CD4 T cells remain uninfected it is likely that this does not represent a higher number of target cells, and this warrants further investigation.” (lines 325-326).

      (5) Line 322: Some people in the field bristle at the concept of total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia. Please consider rephrasing. 

      We acknowledge that there are deferring opinions regarding total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia, however defective HIV proviruses may contribute to persistent immune dysfunction and T cell exhaustion that are associated comorbidities and adverse clinical outcomes in people living with HIV.  We have explained in the text that total HIV-DNA does not distinguish between replication-competent and -defective viruses that contribute to the viral reservoir.

      (6) Line 339: The under-sampling statement is an understatement. The degree of under-sampling is massive and biases estimates of clonality and sensitivity for intact HIV. Please see and consider citing work by Dan Reeves on this subject.

      We agree and have cited work by Dan Reeves (line 358).

      (7) Line 351: This is not a head-to-head comparison of biphasic decay as the Siliciano group's work (and others) does not start to consider HIV decay until one year after ART. I think it is important to not consider what happens during the first year of ART to be reservoir decay necessarily.

      Well noted.

      (8) Line 366-371: This section is underwritten. In nearly all PWH studies to date, observed reservoirs are highly clonal.

      We agree that observed reservoirs are highly clonal but have not added anything further to this section.

      (9) It would be nice to have some background in the intro & discussion about whether there is any a priori reason that clade C reservoirs, or reservoirs in South African women, might differ (or not) from clade B reservoirs observed in different study participants.

      We have now added this to the introduction (lines 94-103).

      (10) Line 248: This sentence is likely not accurate. It is probable that most of the reservoir is sustained by the proliferation of infected CD4+ T cells. 50% is a low estimate due to under-sampling leading to false singleton samples. Moreover, singletons can also be part of former clones that have contracted, which is a natural outcome for CD4+ T cells responding to antigens &/or exhibiting homeostasis. The data as reported is fine but more complex ecologic methods are needed to truly probe the clonal structure of the reservoir given severe under sampling.

      Well noted.

    1. eLife Assessment

      This study reports an important finding on the mechanism underlying the enhancement of anti-viral immune responses by febrile temperatures, especially the role of the conserved heat-shock factor, HSF-1. The data provide compelling support for the authors' model wherein increased temperature in the shrimp Litopenaeus vannamei activates HSF1, which in turn enhances anti-viral response via up-regulation of the nSWD protein and antibacterial peptides. The work, which will be of interest to virologists, immunologists, and cell biologists, would benefit from more discussion of the function and roles of HSF-1 at 25°C vs. 32°C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      We thank the reviewer for the kind assessment of our paper.  

      Strengths: 

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species. 

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins. 

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant. 

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain. 

      Altogether, the work is comprehensive, experiments are designed well, and conclusions are made based on the data generated after verification using multiple complementary approaches.

      We are glad that this reviewer finds our study of interest and well designed.   

      Weaknesses: 

      (1) The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. The authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript - the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      We concur with the reviewer that we do not have direct evidence showing a more fragile PG in the virR mutant and our statement is supported by a compendium of different results. However, this statement is framed in the discussion section as a possible scenario, acknowledging that more experiments are needed to make such connection. Nevertheless, we provide additional data on the molecular characterization of virRmut PG using MS to show a significant increase in the abundance of deacetylated muropeptides, a feature that has been linked to altered lysozyme sensitivity in other unrelated Gram-positive bacteria

      (Fig 8 G,H).  

      (2.1) Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. 

      We concur with the reviewer that information provided by transcriptomics and proteomics is a bit fragmented and, taking into consideration the low correlation between both datasets, it does not help to explain the phenotype observed in the mutant. This issue has also been raised by another reviewer so, we have paid special attention to that. 

      To refine the biological interpretation of the transcriptomic data we have integrated the complemented strain (virRmut-Comp) in our analyses. This led us to narrow down the virR-dependent transcriptomics signature to the sets of genes that appear simultaneously deregulated in virRmut with respect to both WT and complemented strain in either direction. Furthermore, to identify the transcription factors whose regulatory activity appear disrupted in the mutant strain, we have resorted to an external dataset (Minch et al. 2015) and found a set of 10 transcriptional regulators whose regulons appear significantly impacted in the virRmut strain. While admittedly these improvements do not fully address the question tackled by the reviewer, we found that they contribute to a more precise characterization of the VirR-dependent transcriptional signatures, as well as the regulons, in the genome-wide transcriptional regulatory network of the pathogen that appear altered because of virR disruption. We acknowledge that the lack of correlation between whole-cell lysates proteomics and transcriptomic data is something intriguing, albeit not uncommon in Mycobacterium tuberculosis. However, differences in the protein cargo of the vesicles from different strains share key pathways in common with the transcriptomic analyses, such as the enrichments in cell wall biogenesis and peptidoglycan biosynthesis that are observed both among genes that are downregulated in both cases in virRmut.

      (2.2) TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      We also agree with the reviewer that TLC, as it is, it is not quantitative. However, we do not have access to radioactive procedures. In the new version of the manuscript, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Our results show a reduction in the pool of SL and DATs in the mutant, indicating that part of the methylmalonil pool is diverted to the synthesis of PDIMs. 

      (3) The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      We concur with the reviewer that cholesterol as a sole carbon source is introducing many changes in Mtb cells beside permeability. Consequently, we investigated the virRmut lipid profile upon exposure to either cholesterol or TRZ (Fig S8). Both WT and virRmut-Comp strains were included in the analysis. Polar lipid analysis revealed that either cholesterol or TRZ exposure induced a marked reduction in PIMs and cardiolipin (DPG) levels in virRmut relative to WT or complemented strains (Fig S8A). Analysis of apolar lipids indicated that, relative to glycerol MM, virRmut cultured in the presence of cholesterol or TRZ showed reduced levels of TDM and DATs compared to WT and virRmut-Comp strains (Fig S8B). These results suggest a lack of correlation between modulation of cell permeability by cholesterol and TRZ and lipid levels in the absence of VirR.

      Furthermore, about this section, we would like to mention that we have modified the reference used for the annotation of the DosR regulon: moving from the definition of the regulon used in the previous submission (coming from Rustad, el at. PLoS One 3(1), e1502 (2008). The enduring hypoxic response of Mycobacterium tuberculosis) to the more recent characterization of the regulon based on CHiPseq data, reported in Minch et al. 2015. This was done to ensure coherence with the transcriptomics analyses in the new figure 4.

      (4) Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how the SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Public Review): 

      Summary: 

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vesiculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vesiculogenesis. 

      Strengths: 

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production. 

      We thank the reviewer for the kind assessment of the paper.

      Weaknesses: 

      Throughout the study, the authors have utilized a CRISPR knockout of VirR. VirR is a non-essential gene for the growth of Mtb; a null mutant of VirR would have been a better choice for the study. 

      According to Tn mutant databases and CRISPR databases, virR is a non-essential gene. However, we have tried to interrupt this gene using the allelic exchange substitution approach via phages many times with no success. So far there is no precedent of a clean KO mutant in this gene. White et al., generated a virR mutant consisting of deletion of a large fragment of the c-terminal part of the protein, pretty much replicating the effect of the Tn insertion site in the virR Tn mutant. These precedents made us to switch to CRISPR technology.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) The authors monitored cell lysis by measuring the release of a cytoplasmic iron-responsive protein (IdeR). Since EV release is regulated by iron starvation, which is directly sensed by IdeR, another control (unrelated to iron) is needed. A much better approach would be to use hydrophobic/hydrophilic probes to measure changes in the cell wall envelope.

      Does the VirR complemented strain have a faint IdeR band in the supernatant? The authors need to clarify. Also, it's unclear whether the complementation restored normal VirR levels or not. 

      We thank the reviewer for this recommendation. Consequently, we have complemented these studies by an alternative approach based on serially diluted cultures spotted on solid medium. These results align very well with that of western blot using IdeR levels in the supernatant as a surrogate of cell lysis.

      We also noticed the presence of a faint IdeR band in the supernatant of the complemented strain and suggestive of a possible cell lysis. However, as shown in other section this was not translated into increased levels of vesiculation. As previously shown in a previous paper describing VirR as a genetic determinant of vesiculogenesis, VirR levels in the complemented strains are not just restored but increased considerably. This overexpression could explain the potential artifact of a leaky phenotype in the complemented strain. In addition to that previous study, the proteomic data included in this paper clearly shows a restoration of VirR levels relative to the WT strains.

      (2) Figure 2C: The data are weak; I don't see any difference in incorporating FDAAs in MM media. Even in the 7H9 medium, differences appear only at the last time point (20 h). What happens at the time point after 20 h (e.g., 48 h)? How do we differentiate between defective permeability or anabolism leading to altered PG? No statistical analysis was performed.

      We apologize for the incomplete assessment of the results in this figure. First, this figure just shows differential incorporation of FDAAs in the different strains in different media. As per previous studies (Kuru et al (2017) Nat. Protocols), these probes can freely enter into cells and may be incorporated into PG by at least three different mechanisms, depending on the species: through the cytoplasmic steps of PG biosynthesis and via two distinct transpeptidation reactions taking place in the periplasm. Consequently, the differential labeling observed in virRmut relative to WT strain may be a consequence of the enlarge PG observe din the mutant. We have repeated the experiment and created new data. First, we have cultured strains with a blue FDAA (HADA) for 48 to ensure full labeling. Then, we washed cells and cultured in the presence of a second FDAA, this time green (FDL) for 5 h. The differential incorporation of FDL relative to HADA was then measured under the fluorescence microscope. This experiment showed a virRmut incorporate more FDL that the other strains, suggesting an altered PG remodeling.  modified the figure to make clearer the early and late time points of the time-course and applied statistics.

      (3) Many genes (~ 1700) were deregulated in the mutant. Since these transcriptional changes do not correlate at the protein level in WCL, it's important to determine VirR-specificity. RNA-Seq of VirR complemented strain is important.

      We think this was an extremely important point, and we thank the reviewer for pointing this out. Following their suggestion, we have analyzed and integrated data from the complemented strain, which we have added to the GEO submission, to conclude that, in fact, differences in expression between the complemented strain and either the WT, or virRmut are also common and highly significant. Albeit this is not completely unexpected, given the nature of our mutants and the fact that the complemented strains show significantly higher levels of expression of VirR -both at the RNA and protein levels- than the WT, it motivated us to narrow down our definition of VirR-dependent genes to adopt a combined criterium that integrated the complemented strain. Following this approach, we considered the set of genes upregulated (downregulated) in virRmut as those whose expression in that strain is, at the same time, significantly higher (lower) than in WT as well as in virRmut-Comp. Working with this integrated definition, the genes considered -399 upregulated and 502 downregulated genes- are those whose observed expression changes are more likely to be genuinely VirR-dependent rather than any non-specific consequence of the mutagenesis protocols. Despite the lower number of genes in these sets, the repetition of all our functional enrichment analyses based on this combined criterium leads us to conclusions that are largely compatible with those presented in the first version of the paper.

      (4) Transcriptome data provide no clues about how VirR could mediate expression deregulation. Is there an overlap with the regulations/regulons of any Mtb transcription factors? One clue is DosR; however, DosR only regulates 50-60 genes in Mtb. 

      Again, we would like to thank the reviewer for this recommendation, which we have followed accordingly to generate a new section in the results named “VirR-dependent genes intersect the regulons of key transcriptional regulators of the responses to stress, dormancy, and cell wall remodeling”. As we explain in this new section, we resorted to the regulon annotations reported in (Minch et al. 2015), where ChIP-seq data is collected on binding events between a panel of 143 transcription factors (TFs) and DNA genome-wide. The dataset includes 7248 binding events between regulators and DNA motifs in the vicinity of targets’ promoters. After completing enrichment analyses with the resulting regulons, we identified 10 transcription factors whose intersections with the sets of up and downregulated genes in virRmut were larger than expected by chance (One tailed Fisher exact test, OR>2, FDR<0.1). Those regulators -which, as guessed by the referee, included DevR-, control key pathways related with cell wall remodeling, stress responses, and transition to dormancy.

      (5) How many proteins that are enriched or depleted in the EVs of the VirR mutant also affected transcriptionally in the mutant? How does VirR regulate the abundance and transport of protein in EVs? 

      While the intersection between genes and proteins that appear upregulated in the virRmut strain both at transcriptional and vesicular protein levels (N=21) was found larger than expected by chance (OR=2.0 p=7.0E-3), downregulated genes and proteins in virRmut (N=14) were not enriched in each other. These results, indicated, at most, a scarce correlation between RNA and protein levels (a phenomenon nonetheless previously observed in Mycobacterium tuberculosis, among other organisms, see Cortés et al. 2013). Admittedly, the compilation of these omics data is insufficient, by itself to pinpoint the specific regulatory mechanisms through which the absence of VirR impacts protein abundance in EVs. For the sake of transparency, this has been acknowledged in the discussion section of the resubmitted version of the manuscript.

      (6) The assumption that a depleted pool of methylmalonyl CoA is due to increased utilization for PDIM biosynthesis is problematic. Without flux-based measurement, we don't know if MMCoA is consumed more or produced less, more so because Acc is repressed in the VirR mutant EVs. Further, MMCoA feeds into the TCA cycle and other methyl-branched lipids. Without data on other lipids and metabolism, the depletion of MMCoA is difficult to explain.

      The differential expression statistics compiled suggest that both effects may be at place, since we observed, at the same time, a downregulation of enzymes controlling methylmalonyl synthesis from propionyl-CoA (i.e. Acc, at the protein level), as well as an upregulation of enzymes related with its incorporation into DIM/PDIMs (i.e. pps genes). Both effects, combined, would favor an increased rate of methylmalonyl production, and a slower depletion rate, thus contributing to the higher levels observed. We however concur with the reviewer that fluxomics analyses will contribute to shed light on this question in a more decisive manner, and we have acknowledged this in the discussion section too.   

      (7) Figure 5: Deregulation of rubredoxins and copper indicates impaired redox balance and respiration in the mutant. The data is complex to connect with permeability as TRZ is mycobactericidal and also known to affect the respiratory chain. The authors need to investigate if, in addition to permeability, the presence of VirR is essential for maintaining bioenergetics.

      The data related to rubredoxins and copper has been modified after reanalyzing transcriptomic data including the complemented strain. Nevertheless, we found that some features of the response to stresses may be impaired in the mutant, including the one to oxidative stress. In this regard, we found the enhanced sensitivity of the mutant to H2O2 relative to WT and complemented strains. This piece of data is now included as Fig S3 in the new version of the manuscript.

      (8) Differential regulation of DoS regulon and cholesterol growth could also be linked to differences in metabolism, redox, and respiration. What is the phenotype of VirR mutants in terms of growth and respiration in the presence of cholesterol/TRZ? 

      We thank the reviewer for this suggestion. Consequently, we have added a new section to Results that suggest that other aspects of mycobacterial physiology may be affected in the virR mutant when cultured in the presence of cholesterol or TRZ: 

      “Modulation of EV levels and permeability in virRmut by cholesterol and TRZ. We next wondered about the effect of culturing virRmut on both cholesterol or TRZ could have on cell growth, permeability and EV production. In the case of cholesterol, it has also been shown to affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability (Lu et al., 2017). We monitored virRmut growth cultured in MM supplemented with either glycerol, cholesterol as a sole carbon source, and TRZ at 3 ug ml-1 for 20 days. While cholesterol significantly enhanced the growth virRmut after 5 days relative to glycerol medium, supplementation of glycerol medium with TRZ restricted growth during the whole time-course (Fig S5A). The study of cell permeability in the same conditions indicated that the enhanced cell permeability observed in glycerol MM was reduced when virRmut when cultured with cholesterol as sole carbon source. Conversely, the presence of TRZ increased cell permeability relative to the medium containing solely glycerol (Fig S5C). As we have previously observed for the WT strain, either condition (Chol or TRZ) also modified vesiculation levels in the mutant accordingly (Fig S5B). These results strongly indicates that other aspects of mycobacterial physiology besides permeability are also affected in the virR mutant and may contribute to the observed enhanced vesiculation.

      (9) PDIM TLC is not evident; both DimA and DImB should be clearly shown. It will also be necessary to show other methyl-branched lipids, such as SL-1 and PAT/DAT, because the increase in PDIM can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT. Studies have shown that SLI-, PAT/DAT, and PDIM are tightly regulated, where an increase in one lipid pool can affect the abundance of other lipids. Quantitative assays using 14C acetate/propionate are most appropriate for these experiments. 

      We apologize for the fact that TLC analysis is not performed in a radioactive fashion. However, we do not have access to this approach. To answer reviewer question about the fact that other methyl-branched lipids may explain the altered flux of methyl malonyl CoA, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Notably, we observed a reduction in the level of these lipids (SL1 or PAT/DAT) in virRmut cultured in glycerol relative to WT and complemented strains, suggesting that the excess of PDIM synthesis can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT in the absence of VirR (Fig S8B).

      (10) Figure 8: Interaction between VirR and Lcp proteins. Since these interactions are happening in the membrane, using a split GFP system where proteins are expressed in the cytoplasm is unlikely to be relevant.

      Also, experiments on Figure 8C are performed once, and representation needs to be clarified; split GFP needs a positive control, and negative control (CtpC) is not indicated in the figure.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Recommendations For The Authors):  

      (1) Authors should consider making more effort to mine the omics data and integrate them. Given the amount of data that is generated with the omics, they need to be looked at together to find out threads that connect all of them. 

      In the resubmitted version of the paper, we have followed reviewer´s recommendation by incorporating new analyses that integrated the virRmut-C strain, and tried to provide context to the differences found in the context of broader transcriptional regulatory networks (new figure 4), as well as in the context of metabolic pathways related with PDIM biosynthesis from methylmalonyl (figure 6I, already present in the first submission). We consider that these additions contribute to a deeper interpretation of the omics data in the line of what was suggested by the reviewer.

      (2) The interpretation given by authors in lines 387-390 is an interpretation that does not have sufficient support and, hence should be moved into discussion. 

      We thank the reviewer for this recommendation. We believe that these new analyses and integration studies now support the above statement.

    2. eLife Assessment

      In this important study, the authors set out to investigate the biogenesis of extracellular vesicles in mycobacteria and provide several observations to link VirR with vesiculogenesis, PG metabolism, lipid metabolism, and cell wall permeability. Whilst some of the evidence provided is convincing, there are still some shortcomings in the revised manuscript where the data to support the proposed mechanism remain incomplete. The work will be of interest to bacteriologists.

    3. Reviewer #1 (Public review):

      Summary:

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      Strengths:

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species.

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins.

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant.

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain.

      Altogether, the work is comprehensive, experiments are designed well, and conclusions were made based on the data generated after verification using multiple complementary approaches.

      Weaknesses:

      The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. Authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript - the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

    4. Reviewer #2 (Public review):

      Summary:

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vasculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vasculogenesis.

      Strengths:

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production.

      Weaknesses:

      Throughout the study, the authors have utilized a CRISPR knockout of VirR. VirR is a non-essential gene for the growth of Mtb; a null mutant of VirR would have been a better choice for the study.

      Comments on the revised version:

      Concerns flagged about using CRISPR -guide RNA mediated knockdown of viral has yet to be addressed entirely. I understand that the authors could not get knock out despite attempts and hence they have guide RNA mediated knockdown strategy. However, I wondered if the authors looked at the levels of the downstream genes in this knockdown.

      Authors have used the virmut-Comp strain for some of the experiments. However, the materials and methods must describe how this strain was generated. Given the mutant is a CRISPR-guide RNA mediated knockdown. The CRISPR construct may have taken up the L5 loci. Did authors use episomal construct for complementation? If so, what is the expression level of virR in the complementation construct? What are the expression levels of downstream genes in mutant and complementation strains? This is important because the transcriptome analysis was redone by considering complementation strain. The complemented strain is written as virmut-C or virmut-Comp. This has to be consistent.

    1. eLife Assessment

      This fundamental work advances our understanding of the mechanisms underlying lactation-induced infertility. Compelling evidence supports the notion that prolactin inhibits kisspeptin activity and LH pulsatile release and that loss of this signal results in an early reestablishment of fertility during lactation. This work will be of interest to endocrinologists and reproductive biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Hackwell and colleagues performed technically impressive, long-term, GCaMP fiber photometry recordings from Kiss1 neurons in the arcuate nucleus of mice during multiple reproductive states. The data show an immediate suppression of activity of arc Kiss1 neuronal activity during pregnancy that is maintained during lactation. In the absence of any apparent change in suckling stimulus or milk production, mice lacking prolactin receptors in arcuate Kiss1 neurons regained Kiss1 episodic activity and estrous cyclicity faster than control mice, demonstrating that direct prolactin action on Kiss1 neurons is at least partially responsible for suppressing fertility in this species. The effect of loss of prolactin receptors from CamK2a expressing neurons was even greater, indicating either that prolactin sensitivity in Kiss1 neurons of the RP3V contributes to lactational infertility or that other prolactin-sensitive neurons are involved. These data demonstrate the important role of prolactin in suppressing Kiss1 neuron activity and thereby fertility during the lactational period in the mouse.

      Strengths:

      This is the first study to monitor activity of the GnRH pulse-generating system across different reproductive states in the same animal. Another strength in the study design is that it isolated the effects of prolactin by maintaining normal lactation and suckling (assessed indirectly using pup growth curves). The study also offers insight into the phenomenon of postpartum ovulation in mice. The results showed a brief reactivation of arcuate Kiss1 activity immediately prior to parturition, attributed to falling progesterone levels at the end of pregnancy. This hypothesis will be of interest to the field and is likely to inspire testing in future studies. With the exceptions mentioned below, the conclusions of the paper are well supported by the data and the aims of the study were achieved. This paper is likely to raise the standard for technical expectations in the field and spark new interest in the direct impact of prolactin on Kiss1 neurons during lactation in other species.

      Weaknesses:

      A weakness in the approach is the use of genetic models that do not offer complete deletion of the prolactin receptor from targeted neuronal populations. A substantial proportion of Kiss1 neurons in both models retains the receptor. As a result, it is not clear whether the partial maintenance of cyclicity during lactation in the genetic models is due to incomplete deletion or to the involvement of other factors. In addition, results showing no impact of progesterone on LH secretion during lactation are surprising, given the effectiveness of progesterone-containing birth control in lactating women. While the authors assert their findings may reflect an important role for prolactin in lactational infertility in other mammalian species, that remains to be seen. Hyperprolactinemia is known to suppress GnRH release, but its importance in the suppression of cyclicity during the lactation is controversial. Indeed, in several species, the stimulus of suckling is considered to be the main driver of lactational fertility suppression. Data from rats shows that exogenous prolactin was unable to suppress LH release in dams deprived of their pups shortly after birth; both suckling and prolactin were necessary to suppress a post-ovariectomy rise in LH levels. The duration of amenorrhea does not correlate with average prolactin levels in humans, and suckling but not prolactin was required to suppress the postpartum rise in LH in the rhesus monkey. The protocol of this or other studies might result in discordant results; alternatively, mice may be an outlier in their mechanism of cycle suppression.

      Comments on revised version:

      I remain enthusiastic about this article, which has been substantially improved in this revision. However, I didn't feel the authors responded to any of the points I raised previously in my public review (see Weaknesses), for example by adding to the manuscript's discussion section. These are the larger, conceptual issues that speak to the value of the paper in the context of the existing literature. The authors could also state they feel they have addressed the issues raised sufficiently in the text.

    3. Reviewer #2 (Public review):

      Summary:

      The overall goal of Eleni et al. is to determine if the suppression of LH pulses during lactation is mediated by prolactin signaling at kisspeptin neurons. To address this, the authors used GCaMP fiber photometry and serial blood sampling to reveal that in vivo episodic arcuate kisspeptin neuron activity and LH pulses are suppressed throughout pregnancy and lactation. The authors further utilized knockout models to demonstrate that the loss of prolactin receptor signaling at kisspeptin cells prevents the suppression of kisspeptin cell activity and results in the early reestablishment of fertility during lactation. The work demonstrates exemplary design and technique, and the outcomes of these experiments are sophistically discussed.

      Strengths:

      This manuscript demonstrates exceptional skill with powerful techniques and reveals a key role for arcuate kisspeptin neurons in maintaining lactation-induced infertility in mice. In a difficult feat, the authors used fiber photometry to map the activity of arcuate kisspeptin cells into lactation and weaning without disrupting parturition, lactation, or maternal behavior. The authors used a knockout approach to identify if the inhibition of fertility by prolactin is mediated via direct signaling at arcuate kisspeptin cells. Although the model does not perfectly eliminate prolactin receptor expression in all kisspeptin neurons, results from the achieved knockdown support the conclusion that prolactin signaling at kisspeptin neurons is required to maintain lactational infertility. The methods are advanced and appropriate for the aims, the study is rigorously conducted, and the conclusions are thoughtfully discussed.

      Comments on the latest version:

      All comments and suggestions have been addressed by the authors in this revision.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I recommend being explicit regarding how the animals were habituated to blood sampling.

      On lines 109-111 we have added a more detailed explanation of how mice were habituated to blood sampling. This includes details that mice were held and had their tails palpated for approximately 5 minutes per day.

      Were any mice excluded due to loss or movement of the implant over time? Any details to allow replication of long-term measurements like this should be included.

      No mice lost their cannulas during experimentation so we have added a sentence on this on lines 303 to 304 to this effect.  We have also noted that there was a slight decrease in signal over the months of experimentation. A statement on line 318 has also been added that clarifies two mice lost between the pregnancy and lactation stages of experiment were euthanised due to dystocia.

      The text states that synchronized episodic activity reappeared as early as 3 days after birth, citing Figure 6c as evidence. There is no 6c. Figure 6b shows day 5 after birth.

      This has been corrected.

      The methods state mRNA levels had to be "above background" to be counted as colocalization. At how many fold/what percent above background was a cell considered positive for expression?

      Positive hybridisation was scored according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      Please ensure figure titles or the data graphs explicitly give the genotype of the mice in all figures (or state the mice are wildtype).

      Genotype has been added to figure titles where possible. Genotypes are always given in figure legends and tables and/or explicitly stated on the figure itself.

      Figure 4's title states events are "perfectly" correlated, which is a subjective term. I recommend saying "consistently" or "temporally" correlated, depending on your meaning.

      This has been amended to read “consistently correlated”

      Reviewer #2 (Recommendations For The Authors):

      The comments below aim to clarify the paper's methodology and results but do not detract from my overall enthusiasm for this work.

      - Given past studies demonstrating prolactin action in the brain, particularly the MPOA/MPN, is essential for maternal behavior, can the authors please clarify why this behavior is retained in the cam2a prlr knockout mice? The authors mention that prlr in the MPOA is only knocked down 50% compared to WT controls. Is this sufficient to retain maternal behavior?

      In our experience 50% Prlr in the MPOA is sufficient to retain normal maternal behaviour in most animals including the ones in this experiment (our original paper describing this showed relatively normal behavior, for example, with a vGAT and vGlut-mediated knockouts, and even a double knockout – it was only when we achieved complete KO with an AAV-Cre that we saw failure of maternal behavior – Brown et al, PNAS 3;114(40):10779-10784 2017). We have added a statement on lines 157-159 regarding this.  We have an additional paper in preparation specifically characterising the maternal behaviour and lactation outcomes in this line of mice, and we find most animals display normal maternal behaviour, with slightly impaired milk production in later lactation.

      - Supplementary Figure 1. Can the authors please clarify the criteria for a cell to be positive for prlr? The methods state that the signal must be "above background level." How was the background measurement obtained? In the negative control?

      As per above, scoring of positive hydribisation was done according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      - Lines 310-314: This sentence describes RNAscope analysis of prlr knockdown in kisspeptin cells and refers to Extended Figure 3 - but I believe this is in Supplementary Figure 1.

      This has been corrected.

      - Figure 3-4: When mice return to estrous cycles, the amplitude of episodic kisspeptin neuron activity is the same as 24 hours after weaning, which appears much lower than in virgin females. Does this reach significance? If so, do the authors know why kisspeptin activity is still suppressed, and can they comment on why this may not affect estrous cyclicity?

      This does not reach significance – see Supplementary Table 1 (4C) for statistics. Therefore, no further analysis was done. This question would need to be examined with a follow up experiment. Given the 5s on, 15 s off scheduled mode of recording used here, amplitude was not an extremely accurate measure and amplitude has been reported as relative within each mouse. There is also an additional issue of a gradual reduction in amplitude of signal over time in these long-term experiments – although it is true that much larger signals were detected after ovariectomy at the end of the experiment.  At present, we have not tried to interpret whether the changes in amplitude are informative.

      - Fiber photometry studies: Please indicate whether a post-mortem examination of GCaMP transfection and fiber photometry placement was conducted, and what region of the ARC was imaged.

      Brains from these mice were collected, however postmortem analysis of cannula placement of GCaMP6 transfection was not carried out in all mice. This was based on our experience with this method, in that the quality and characteristic pattern of activity seen, as well as corresponding LH secretion following an SE, was indicative of successful cannula placement and transfection.  Incorrectly placed cannular failed to show SEs. A trial was done with 3 mice and cannula placement was found to be in the caudal ARC (cARC) with GFP (attached to GCaMP) restricted to the cARC. A statement has been added on lines 306-313 regarding this.

      - Were male mice removed before birth? Please add to the methods section if not included.

      Yes, male mice were removed after a sperm plug was seen and were never present at parturition. We have inserted additional details on line 95 to this effect.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 172: n=7-8 per group, yet in Supplementary Figure 2, n=6 per group.

      These are referring to different groups of mice. N=7-8 is referring to the group size of mice in Figure 2 that were given mifepristone or vehicle control. In contrast the Supplementary figure 2 n number refers to the mice in the pilot study. Additional n number for the pilot study has been added on line 194.

      (2) Line 314: Extended = suppl; Figure 3 = 1.

      This has been corrected.

      (3) Line 451: Figure 6C, does not exist.

      This has been corrected.

      Line 590: Reference 23 could be replaced by Ordog T et al 1998 Am J Physiol 274,E665 because it is later and more relevant to the topic.

      This reference has been replaced with the suggested reference.

    1. eLife Assessment

      In this useful study, Wang and colleagues investigate the potential probiotic effects of Bacillus velezensis in a murine model. They provide convincing evidence that B. velezensis limits the growth of Salmonella typhimurium in lab culture and in mice, together with beneficial effects on the microbiota. The overall presentation of the manuscript has improved and the work will be of interest to infectious disease researchers.

    2. Reviewer #1 (Public review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed; the main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Strengths:

      An extensive study on the probiotic properties of the Bacillus velezensis strain HBXN2020

      Weaknesses:

      The main results are descriptive without mechanistic insight. Additionally, most of the results and analysis parts are separated without a link or a story-telling way to deliver a concise message.

      Now the manuscript has made appropriate and considerable improvements.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed; the main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Thank you very much for your reading and comments our manuscript.

      Strengths:

      An extensive study on probiotic property of the Bacillus velezensis strain HBXN2020

      Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      The main results are descriptive without mechanistic insight. Additionally, most of the results and analysis parts are separated without a link or a story-telling way to deliver a concise message.

      Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. The manuscript results and analysis sections have been extensively revised. We appreciate your review and feedback.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have potential benefit to serve as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability to HBXN2020 to inhibit growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Thanks for the comments and the positive reception of the manuscript.

      (3) Mouse experiments are very convincing.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there no investigation of the mechanism that underpins this.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) Mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores that current gold standard for treatment.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation.

      Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      Few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Most of my previous comments are well addressed, here are a few examples.<br /> While in my last comment, I requested a Colitis Mouse Model, which will well resemble the diarrhea disease caused by Salmonella in mammals. The available statement is not convincing, please check https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2225501/, https://pubs.rsc.org/en/content/articlelanding/2020/fo/d0fo01017k please replace "colitis" to a normal infection model. The current statement is incorrect.

      Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 2, 29, 38, 46, 48, 199, 204, 246, 248, 282, 307, 310, 316, 431, 433, 464, 466, 473, 494, 497, 499, 504, 513, 518, 525, 706, 710 and 735 in the revised manuscript.

      Certain parts remain to be overestimated, to my knowledge, the language and logical flow should be addressed thoroughly.

      Here are suggestions to improve the logical flow of the manuscript.

      (1) Probiotic sampling and isolation

      (2) in vitro assessment

      (3) genomic sequencing and in silico safety assessment (Crit Rev Food Sci Nutr. 2023;63(32):11244-11262), which should be included as a right ref.

      (4) in vivo assay for safety evaluation, but not biosafety (it has a different meaning!!)

      (5) infection model and protection assay.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we do our best to correct those problems in the revised manuscript. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      Also, please pay attention to the logical link or transition sentences between each part to connect the dots in each part.

      We gratefully appreciate for your valuable comments. The comments improve the quality of manuscript. According to your suggestion, we have corrected this in the revised manuscript. We have marked the updated contents in the revised manuscript. 

      Finally, there are also lots of typos and errors, please improve through the text.<br /> For example, Line 521. "Stain", and more...

      Thanks for pointing this out. Based on your suggestion, we have corrected in the revised manuscript. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 753, 1055, 1087 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The revised manuscript by Wang and colleagues attempts to address concerns raised during the first round of review.

      All minor comments have been addressed and in general, the major concerns have been partially addressed in the revised manuscript.

      The outstanding concerns relate to the mechanistic basis of the observations. The authors made no attempt to address this in a meaningful manner. Secondly, the issue of comparing the responses to what would be standard therapy (such as anti-inflammatories) was also handled in a somewhat dismissive manner, referring to other ongoing/future work. The clinical utility of the findings are hard to ascertain if there is no comparison to the current gold standard therapeutic approach.

      I have no further suggestions for the authors, save for those previously made.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      Secondly, About the comparative trial of oral bacillus spore treatment with the current gold standard for treatment, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      This is a revision, they have addressed all my concerns, and now it is acceptable.

      Thank you very much for your comments and recognition of the manuscript.

    1. eLife Assessment

      This study reports the fundamental discovery of a novel structure in the developing gut that acts as a midline barrier between left and right asymmetries. Some of the evidence supporting the dynamics, composition, and function of this novel basement membrane in the chick is solid, some is even convincing, but investigation of its origin and impact on asymmetric organogenesis remains challenging and is not yet conclusive. This careful work is of broad relevance to patterning mechanisms, the importance of the extracellular matrix, and laterality disorders.

    2. Joint Public Review:

      When the left-right asymmetry of an animal body is established, a barrier that prevents the mixing of signals or cells across the midline is essential. Such midline barrier preventing the spreading of asymmetric Nodal signaling during early left-right patterning has been identified. However, midline barriers during later asymmetric organogenesis have remained largely unknown, except in the brain. In this study, the authors discovered an unexpected structure in the midline of the developing midgut in the chick. Using immunofluorescence, they convincingly show the chemical composition of this midline structure as a double basement membrane and its transient existence during the left-right patterning of the dorsal mesentery, that authors showed previously to be essential for forming the gut loop and guiding local vasculogenesis. Labelling experiments demonstrate a physical and chemical barrier function, to cell mixing and signal diffusion in the dorsal mesentery. Cell labelling and graft experiments rule out a cellular composition of the midline from dorsal mesenchyme or endoderm origin and rule out an inducing role by the notochord. Based on laminin expression pattern and Ntn4 resistance, the authors propose a model, whereby the midline basement membrane is progressively deposited by the descending endoderm. Observations of a transient midline basement membrane in the veiled chameleon suggest a conserved mechanism in birds and reptiles.

      Laterality defects encompass severe malformations of visceral organs, with a heterogenous spectrum that remains poorly understood, by lack of knowledge of the different players of left-right asymmetry. This fundamental work significantly advances our understanding of left-right asymmetric organogenesis, by identifying an organ-specific and stage-specific midline barrier. The complexities of basement membrane assembly, maintenance and function are of importance in several other contexts, as for example in the kidney and brain. Thus, this original work is of broad interest.

      Overall, reviewers refer to a strong and elegant paper discovering a novel midline structure, combining classic but challenging techniques, and well thought tools, to show the dynamics, chemical and physical properties of the midline. Reviewers also indicate that further work will be necessary to conclude on the origin and impact of the midline for asymmetric organogenesis. They acknowledge that this is currently technically challenging and that authors have made several attempts to answer these questions by different means. The article includes an interesting discussion about these points and the mechanism of midline breakdown.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      Left-right asymmetry in the developing embryo is important for establishing correct lateralisation of the internal organs, including the gut. It has been shown previously that the dorsal mesentery (DM), which supports looping of the endodermal gut tube during development, is asymmetric with sharp delineation of left and right domains prior to gut looping. The authors set out to investigate the nature of the midline barrier that separates the left and right sides of the DM. They identify a transient basement membrane-like structure which is organised into two layers between the notochord and descending endoderm. In the time window when this basement membrane structure exists, there is no diffusion or cell mixing between the left and right sides of the DM, but once this structure starts breaking down, mixing and diffusion occur. This suggests it acts as a barrier, both physical and chemical, between left and right at the onset of gut lateralisation.

      Strengths:

      The authors identify a new midline structure that likely acts as a barrier to facilitate left and right separation during early organogenesis. This is an interesting addition to the field of laterality, with relevance to laterality-related disorders including heterotaxia, and may represent a gut-specific mechanism for establishing and maintaining early left-right asymmetry. The structure of this midline barrier appears to be an atypical basement membrane, comprising two adjacent basement membranes. The complexities of basement membrane assembly, maintenance, and function are of importance in almost all organismal contexts. Double basement membranes have been previously reported (for example in the kidney glomeruli as the authors note), and increasing evidence suggests that atypical basement membrane organisation or consideration is likely to be more prevalent than previously appreciated. Thus this work is both novel and broadly interesting.

      The data presented are well executed, using a variety of well-established methods. The characterisation of the midline barrier at the stages examined is extensive, and the data around the correlation between the presence of the midline barrier and molecular diffusion or cell mixing across the midline are convincing.

      Weaknesses:

      The study is rather descriptive, and the authors' hypotheses around the origins of the midline barrier are speculative and not experimentally demonstrated. While several potential origins of the midline are excluded raising interesting questions about the timing and cell-type-specific origin of the midline basement membrane, these remain unanswered which limits the scope of the paper.

      We extend our appreciation to Reviewer #1 for their thoughtful and comprehensive evaluation of our work, recognizing the considerable time and effort they dedicated to our work. We agree that functional data would significantly strengthen our understanding of the midline barrier and its exact role during LR asymmetric gut development. However, we would like to note that repeated and diligent attempts to perturb this barrier were made using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation) but we observed no significant effect or stable disruption of the midline. We acknowledge and accept this limitation and hope that our discovery will invite future investigations and perturbation of this novel midline structure.

      For example, it is unclear whether the two basement membranes originally appear to be part of a single circular/spherical structure (which looks possible from the images) that simply becomes elongated, or whether it is indeed initially two separate basement membranes that extend.

      We favor the hypothesis that the elongation of the preexisting small circular structure to an extended double membrane of relatively increased length would be unlikely without continued contribution of new basement membrane components. However, our attempts to label and trace the basement membrane of the endoderm using tagged laminins (LAMB1-GFP, LAMB1-His, and LAMC1-His), and more recently tagged nidogen constructs (NID1-GFP and NID1-mNG) have met with export issues (despite extensive collaboration with experts, Drs. Dave Sherwood and Peter Yurchenco). As such, it remains difficult to differentiate between the two possibilities suggested. We also believe this is an important question and will continue to investigate methods to trace it.

      There is a substantial gap between the BMs at earlier stages before the endoderm has descended - is this a lumen, or is it filled with interstitial matrix?

      Our preliminary studies indicate that the gap enclosed by the basement membranes in the early midline structure does have extracellular matrix present, such as fibrillin-2 (see Author response image 1). Also, the electron microscopy shown in Fig. 2 C’’ supports that the space between the notochord and endoderm has fibrillar matrix.

      Author response image 1.

      The authors show where this basement membrane does not originate from, but only speculate on its origin. Part of this reasoning is due to the lack of Lama1-expressing cells either in the early midline barrier before it extends, or in the DM cells adjacent to it. However, the Laminin observed in the midline could be comprised of a different alpha subtype for example, that wasn't assessed (it has been suggested that the Laminin antibody used in this study is not specific to the alpha-1 subunit, see e.g. Lunde et al, Brain Struct Funct, 2015).

      We appreciate this comment and have tried other laminin RNA probes that showed similar lack of midline expression (Lama1, lama3, lama5). Importantly, the laminin alpha 1 subunit is a component of the laminin 111 heterotrimer, which along with laminin 511 is the first laminin to be expressed and assemble in embryonic basement membranes, as reviewed in Yurchenco 2011. Laminin 111 is particularly associated with embryonic development while laminins 511/521 become the most widespread in the adult (reviewed in Aumailley 2013). It is likely that the midline contains laminin 111 based on our antibody staining and the accepted importance and prevalence of laminin 111 in embryonic development. However, it is indeed worth noting that most laminin heterotrimers contain beta 1, gamma 1, or both subunits, and due to this immunological relation laminin antibody cross reactivity is certainly known (Aumailley 2013). As such, while laminin 511 remains a possibility as a component of the midline BM, our lama5 in situs have shown no differential expression at the midline of the dorsal mesentery (see Author response image 2), and as such we are confident that our finding of no local laminin transcription is accurate. Additionally, we will note that the study referenced by the Reviewer observed cross reactivity between the alpha 1 and alpha 2 subunits. Laminin 211/221 is an unlikely candidate based on the embryonic context, and because they are primarily associated with muscle basement membranes (Aumailley 2013). In further support, we recently conducted a preliminary transcriptional profile analysis of midline cells isolated through laser capture microdissection (LCM), which revealed no differential expression of any laminin subunit at the midline. Please note that these data will be included as part of a follow-up story and falls beyond the scope of our initial characterization.

      Author response image 2.

      Similarly, the authors show that the midline barrier breaks down, and speculate that this is due to the activity of e.g. matrix metalloproteinases, but don't assess MMP expression in that region.

      This is an important point, as the breakdown of the midline is unusually rapid. Our MMP2 RNA in situ hybridization at HH21, and ADAMTS1 (and TS9) at HH19-21 indicates no differential activity at the midline (see Author response images 3 and 4). Our future focus will be on identifying a potential protease that exhibits differential activity at the midline of the DM.

      Author response image 3.

      Author response image 4.

      The authors suggest the (plausible) hypothesis that the descent of the endoderm pulls or stretches the midline barrier out from its position adjacent to the notochord. This is an interesting possibility, but there is no experimental evidence to directly support this. Similarly, while the data supporting the barrier function of this midline is good, there is no analysis of the impact of midline/basement membrane disruption demonstrating that it is required for asymmetric gut morphogenesis. A more functional approach to investigating the origins and role of this novel midline barrier would strengthen the study.

      Yes, we fully agree that incorporating functional data would immensely advance our understanding of the midline barrier and its crucial role in left-right gut asymmetry. However, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. We acknowledge this limitation and remain committed to developing methods to disrupt the midline in our current investigations. We again thank Reviewer #1 for the detailed feedback on our manuscript, guidance, and the time taken to provide these comments.

      Recommendations For The Authors:

      Using Laminin subunit-specific antibodies, or exploring the mRNA expression of more laminin subunits may support the argument that the midline does not derive from the notochord, endoderm, or DM.

      As mentioned above, RNA in situ hybridization for candidate genes and a preliminary RNA-seq analysis of cells isolated from the dorsal mesentery midline revealed no differential expression of any laminin subunits.

      Similarly, expression analysis of Laminin-degrading MMPs, and/or application of an MMP inhibitor and assessment of midline integrity could strengthen the authors' hypothesis that the BM is actively and specifically broken down.

      Our MMP2 RNA in situ hybridization at HH21, and ADAMTS1 at HH19-21shows no differential expression pattern at the midline of the DM (see Author response image 3). We have not included these data in the revision, but future work on this topic will aim at identifying a protease that is differentially active at the midline of the DM.

      Functionally testing the role of barrier formation in regulating left-right asymmetry or the role of endoderm descent in elongating the midline barrier would be beneficial. Regarding the former, the authors show that Netrin4 overexpression is insufficient to disrupt the midline, but perhaps overexpression of e.g. MMP9 prior to descent of the endoderm would facilitate early degradation of the midline, and the impact of this on gut rotation could be assessed.

      Unfortunately, MMP9 electroporation has produced little appreciable effect. We acknowledge that the lack of direct evidence for the midline’s role in regulating left-right asymmetry is a shortcoming, but current work on this subject aims to define the midline’s function to LR asymmetric morphogenesis.

      Reviewer #2:

      When the left-right asymmetry of an animal body is established, the barrier that prevents the mixing of signals or cells across the midline is essential. The midline barrier that prevents the mixing of asymmetric signals during the patterning step has been identified. However, a midline barrier that separates both sides during asymmetric organogenesis is unknown. In this study, the authors discovered the cellular structure that seems to correspond to the midline in the developing midgut. This midline structure is transient, present at the stage when the barrier would be required, and composed of Laminin-positive membrane. Stage-dependent diffusion of dextran across the midline (Figure 6) coincides with the presence or absence of the structure (Figures 2, 3). These lines of indirect evidence suggest that this structure most likely functions as the midline barrier in the developing gut.

      We extend our gratitude to Reviewer #2 for their thoughtful assessment of our research and for taking the time to provide these constructive comments. We are excited to report that we have now included additional new data on midline diffusion using BODIPY and quantification method to further support our findings on the midline's barrier function. While our data on dextran and now BODIPY both indirectly suggests barrier function, we aspire to perturb the midline directly to assess its role in the dorsal mesentery more conclusively. However, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. Moving forward, our focus is on identifying an effective means of perturbation that can offer direct evidence of barrier function.

      Recommendations For The Authors:

      (1) It would be much nicer if the requirement of this structure for asymmetric morphogenesis was directly tested. However, experimental manipulations such as ectopic expression of Netrin4 or transplantation of the notochord were not able to influence the formation of this structure (these results, however, suggested the mechanism of the midline formation in the gut dorsal mesentery). Therefore, it seems not feasible to directly test the function of the structure, and this should be the next issue.

      We fully agree that the midline will need to be perturbed to fully elucidate its role in asymmetric gut morphogenesis. As noted, multiple attempts were ineffective at perturbing this structure. Extensive current work on this topic is dedicated to finding an effective perturbation method.

      (2) Whereas Laminin protein was present in the double basement membrane at the midline, Laminin mRNA was not expressed in the corresponding region (Fig. 4A-C). It is necessary to discuss (with experimental evidence if available) the origin of Laminin protein.

      As we have noted, the source of laminin and basement membrane components for the midline remains unclear - no local transcription and the lack of sufficiency of the notochord to produce a midline indicates that the endoderm to be a likely source of laminin, as we have proposed in our zippering endoderm model. We will note that Fig. 4A-C indicate that laminin is in fact actively transcribed in the endoderm. Currently, attempts to trace the endodermal basement membrane using tagged laminins (LAMB1-GFP, LAMB1-His, and LAMC1-His), and more recently tagged nidogen constructs (NID1-GFP and NID1-mNG) have met with export issues (despite extensive collaboration with experts, Drs. Dave Sherwood and Peter Yurchenco). Confirmation of our proposed endodermal origin model is a goal of our ongoing work.

      (3) Figure 4 (cell polarity from GM130 staining): addition of representative GM130 staining images for each Rose graph (Figure 4E) would help. They can be shown in Supplementary Figures. Also, a graph for the right coelomic epithelium in Fig. 4E would be informative.

      We have added the requested GM130 images in our Supplemental Figures (please refer to Fig. S4ABB’) and modified the main Fig. 4E to include a rose graph for the polarity of the right coelomic epithelium.

      (4) Histological image of HH19 DM shown in Fig. 2J looks somehow different from that shown in Fig. 3F. Does Fig. 2J represent a slightly earlier stage than Fig. 3F?

      Figure 2J and Figure 3F depict a similar stage, although the slight variation in the length of the dorsal mesentery is attributed to the pseudo time phenomenon illustrated in Figure 3J-J’’’. This implies that the sections in Figure 2J and Figure 3F might originate from slightly different positions along the anteroposterior axis. Nonetheless, these distinctions are minimal, and based on the dorsal mesentery's length in Figure 2J, the midline is likely extremely robust regardless of this minor pseudo time difference.

      Reviewer #3:

      Summary:

      The authors report the presence of a previously unidentified atypical double basement membrane (BM) at the midline of the dorsal mesentery (DM) during the establishment of left-right (LR) asymmetry. The authors suggest that this BM functions as a physical barrier between the left and the right sides of the DM preventing cell mixing and ligand diffusion, thereby establishing LR asymmetry.

      Strengths:

      The observation of the various components in the BM at the DM midline is clear and convincing. The pieces of evidence ruling out the roles of DM and the notochord in the origin of this BM are also convincing. The representation of the figures and the writing is clear.

      Weaknesses:

      The paper's main and most important weakness is that it lacks direct evidence for the midline BM's barrier and DM LR asymmetry functions.

      We thank Reviewer #3 for their thoughtful and comprehensive evaluation of our work, recognizing the considerable time and effort they dedicated to assessing our study. We fully agree that incorporating functional data would immensely advance our understanding of the midline barrier and its crucial role in left-right gut asymmetry. However, several distinct attempts at perturbing this barrier have encountered technical obstacles. While our laboratory routinely perturbs the left and right compartments of the DM via DNA electroporation and other techniques, directly perturbing the midline using these methods is far more challenging. We have made diligent attempts to address this using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). However, we have not yet been able to identify a means of producing consistent and interpretable perturbation of the midline. We acknowledge this limitation and remain committed to developing methods to disrupt the midline in our current investigations.

      Recommendations For The Authors:

      Major:

      (1) We suggest the authors test their hypotheses i.e., physical barrier and proper LR asymmetry establishment by the midline BM, by disrupting it using techniques such as physical ablation, over-expression of MMPs, or treatment with commercially available enzymes that digest the BM.

      As above, efforts involving physical ablation and MMP overexpression have not yielded significant effects on the midline thus far. Moving forward, investigating the midline's role in asymmetric morphogenesis will necessitate finding a method to perturb it effectively. In pursuit of progress on this critical question, we recently conducted laser capture microdissection (LCM) and RNA-sequencing of the midline to unravel the mechanisms underlying its formation and potential disruption. This work shows promise but it is still in its early stages; validating it will require significant time and effort, and it falls outside the scope of the current manuscript.

      (2) Lefty1's role in the midline BM was ruled out by correlating lack of expression of the gene at the midline during HH19 when BM proteins expression was observed. Lefty1 may still indirectly or directly trigger the expression of these BM proteins at earlier stages. The only way to test this is by inhibiting lefty1 expression and examining the effect on BM protein localization.

      We have added a section to discuss the potential of Lefty1 inhibition as a future direction. However, similar to perturbing global Nodal expression, interpreting the results of Lefty1 inhibition could be challenging. This is because it may not specifically target the midline but could affect vertebrate laterality as a whole. Despite this complexity, we acknowledge the value of such an experiment and consider it worth pursuing in the future.

      (3) Using a small dextran-based assay, the authors conclude that diffusible ligands such as cxcl2 and bmp4 do not diffuse across the midline (Figure 6). However, dextran injection in this system seems to label the cells, not the extracellular space. The authors measure diffusion, or the lack thereof, by counting the proportion of dextran-labeled cells rather than dextran intensity itself. Therefore, This result shows a lack of cell mixing across the midline (already shown in Figure 2 ) rather than a lack of diffusion.

      We should emphasize that the dextran-injected embryos shown in Fig. 6 D-F were isolated two hours post-injection, a timeframe insufficient for cell migration to occur across the DM (Mahadevan et al., 2014). We also collected additional post-midline stage embryos ten minutes after dextran injections - too short a timeframe for significant cellular migration (Mahadevan et al., 2014). Importantly, the fluorescent signal in those embryos was comparable to that observed in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM when the barrier starts to fragment (HH20-HH23) is unlikely to represent cell migration. More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated substantial cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Collectively, our experiments suggest that the dextran signal we observed at HH20 and HH23 is likely not driven by cell mixing.

      To further strengthen this argument, we now have additional new data on midline diffusion using BODIPY diffusion and quantification method to support our findings on the midline's function against diffusion (please refer to New Fig. 6H-M). Briefly, we utilized a BODIPY-tagged version of AMD3100 (Poty et al., 2015) delivered via soaked resin beads surgically inserted into the left coelomic cavity (precursor to the DM). The ratio of average AMD3100-BODIPY intensity in the right DM versus the left DM was below 0.5 when the midline is intact (HH19), indicating little diffusion across the DM (Fig. 6J). At HH21 when no midline remains, this ratio significantly rises to near one, indicating diffusion of the drug is not impeded when the midline basement membrane structure is absent. Collectively, these data suggest that the basement membrane structure at the midline forms a transient functional barrier against diffusion.

      (4) Moreover, in a previous study (Mahadevan et al., Dev Cell., 2014), cxcl2 and bmp4 expression was observed on both the left and right side before gut closure (HH17, when midline BM is observed). Then their expression patterns were restricted on the left or right side of DM at around HH19-20 (when midline BM is dissociated). The authors must explain how the midline BM can act as a barrier against diffusible signals at HH-17 to 19, where diffusible signals (cxcl12 and bmp4) were localized on both sides.

      We appreciate the Reviewer's invitation to clarify this crucial point. Early in dorsal mesentery (DM) formation, genes like Cxcl12 (Mahadevan et al., Dev Cell 2014) and Bmp4 (Sanketi et al., Science 2021) exhibit symmetry before Pitx2 expression initiates on the left (around ~HH18, Sanketi et al., 2021). Pitx2 then inhibits BMP4 (transcription) and maintains Cxcl12 (mRNA) expression on the left side. The loss of Cxcl12 mRNA on the right is due to the extracellular matrix (ECM), particularly hyaluronan (Sivakumar et al., Dev Cell 2018). Our hypothesis is that during these critical stages of initial DM asymmetry establishment, the midline serves as a physical barrier against protein diffusion to protect this asymmetry during a critical period of symmetry breaking. Although some genes, such as Pitx2 and Cxcl12 continue to display asymmetric transcription after midline dissolution (Cxcl12 becomes very dynamic later on – see Mahadevan), it's crucial to note that the midline's primary role is preventing protein diffusion across it, akin to an insurance policy. Thus, the absence of the midline barrier at HH21 does not result in the loss of asymmetric mRNA expression. We think its primary function is to block diffusible factors from crossing the midline at a critical period of symmetry breaking. We acknowledge that confirming this hypothesis will necessitate experimental disruption of the midline and observing the consequent effects on asymmetry in the DM. This remains central to our ongoing research on this subject.

      (5) On page 11, lines 15-17, the authors mention that "We know that experimentally mixing left and right signals is detrimental to gut tilting and vascular patterning-for example, ectopic expression of pro-angiogenic Cxcl12 on the right-side results in an aberrant vessel forming on the right (Mahadevan et al., Dev Cell., 2014)". In this previous report from the author's laboratory, the authors suggested that ectopic expression of cxcl12 on the right side induced aberrant formation of the vessel on the right side, which was formed from stage HH17, and the authors also suggested that the vessel originated from left-sided endothelial cells. If the midline BM acts as a barrier against the diffusible signal, how the left-sided endothelial cells can contribute to vessel formation at HH17 (before midline BM dissociation)?

      To address this point, we suggest directing the Reviewer to previously published supplemental movies of time-lapse imaging, which clearly illustrate the migration path of endothelial cells from left to right DM (Mahadevan et al., Dev Cell 2014). While the Reviewer correctly notes that ectopic induction of Cxcl12 on the right induces left-to-right migration, it's crucial to highlight that these cells never cross the midline. Instead, they migrate immediately adjacent to the tip of the endoderm (please also refer to published Movies S2 and S3). We observe this migration pattern even in wild-type scenarios during the loss of the endogenous right-sided endothelial cords, where some endothelial cells from the right begin slipping over to the left around HH19-20 (over the endoderm), as the midline is beginning to fragment, but never traverse the midline. We attribute this migration pattern to a dorsal-to-ventral gradient of left-sided Cxcl12 expression, as disrupting this pattern perturbs the migration trajectory (Mahadevan).

      6) It is unclear how continuous is the midline BM across the anterior-posterior axis across the relevant stages. Relatedly, it is unclear how LR segregated the cells are, across the anterior-posterior axis across the relevant stages.

      We refer the reviewer to Fig. 3J-K, in which the linear elongation of the midline basement membrane structure is shown and measured at HH19 in three embryos from the posterior of the embryo to the anterior point at which the midline is fragmented and ceases to be continuous. Similarly, Fig. S2 shoes the same phenomenon in serial sections along the length of the anterior-posterior (AP) axis at HH17, also showing the continuity of the midline. All our past work at all observed sections of the AP axis has shown that cells do not move across the midline as indicated by electroporation of DNA encoding fluorescent reporters (Davis et al. 2008, Kurpios et al. 2008, Welsh et al. 2013, Mahadevan et al. 2014, Sivakumar et al. 2018, Sanketi et al. 2022), and is shown again in Fig. 2 E-H. As noted previously, very few endothelial cells cross the midline at a point just above the endoderm (image above) when the right endothelial cord remodels (Mahadevan et al. 2014), but this is a limited phenomenon to endothelial cells and cells of the left and right DM are fully segregated as previously established.

      Minor comments:

      (1) The authors found that left and right-side cells were not mixed with each other even after the dissociation of the DM midline at HH21 (Fig2 H). And the authors also previously mentioned that N-cadherin contributes to cell sorting for left-right DM segregation (Kurpios et al., Proc Natl Acad Sci USA., 2008). It could be a part of the discussion about the difference in tissue segregation systems before or after the dissociation of DM midline.

      We appreciate this thoughtful suggestion. N-cadherin mediated cell sorting is key to the LR asymmetry of the DM and gut tilting, and we believe it underlies the observed lack of cell mixing from left and right DM compartments after the midline fragments. We have added a brief section to the discussion concerning the asymmetries in N-cadherin expression that develop after the midline fragments.

      (2) Please add the time point on the images (Fig3 C, D, Fig 6A and B)

      We have updated these figures to provide the requested stage information.

      (3) The authors suggested that the endoderm might be responsible for making the DM BM midline because the endoderm links to DM midlines and have the same resistance to NTN4. The authors mentioned that the midline and endoderm might have basement membranes of the same "flavor." However, perlecan expression was strongly expressed in the midline BM compared with the endodermal BM. It could be a part of the discussion about the difference in the properties of the BM between the endoderm and DM midline.

      Perlecan does indeed localize strongly to the endoderm as well as the midline. The HH18 image included in prior Fig. S3 B’, B’’ appears to show atypically low antibody staining in the endoderm for all membrane components. Perlecan is an important component for general basement membrane assembly, and the bulk of our HH18 and HH19 images indicate strong staining for perlecan in both midline and endoderm. Perlecan staining at the very earliest stages of midline formation also indicate perlecan in the endoderm as well, supporting the endoderm as a potential source for the midline basement membrane. We have updated Fig. S3 to include these images in our revision.

      (4) The authors investigated whether the midline BM originates from the notochord or endoderm, but did not examine a role for endothelial cells and pericytes surrounding the dorsal aorta (DA). In Fig S1, Fig S2, and FigS3, the authors showed that DA is very close to the DM midline basement membrane, so it is worth checking their roles.

      We fully agree that the dorsal aorta and the endothelial cords that originate from the dorsal aorta may interact with the midline in important ways. However, accessing the dorsal aorta for electroporation or other perturbation is extremely difficult. Additionally, the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. Vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in DiRusso et al., 2017). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Additionally, no fibronectin is found in the midline basement membrane, while it is enriched in the dorsal aorta (see Supplemental Figure 3CC’C’’). We will briefly note that our preliminary data in quail tissue indicates that QH1+ cord cells (i.e. endothelial cells) sometimes exhibit striking contact with the midline along the dorso-ventral length of the DM, suggesting not an origin but an important interaction.

      Reviewer #4 (Recommendations For The Authors):

      Major comments:

      (1) The descending endoderm zippering model for the formation of the midline lacks evidence.

      We have attempted to address this issue by introducing several tagged laminin constructs (LAMB1-GFP, LAMB1-His, LAMC1-His), and more recently tagged nidogen plasmids (NID1-GFP and NID1-mNG) to the endoderm via DNA electroporation to try to label the source of the basement membrane. Production of the tagged components occurred but no export was observed in any case (despite extensive collaboration with experts in this area, Drs. Dave Sherwood and Peter Yurchenco). This experiment was further complicated by the necessary large size of these constructs at 10-11kb due to the size of laminin subunit genes, resulting in low electroporation efficiency. We also believe this is an important question and are continuing to investigate methods to trace it.

      The midline may be Ntn4 resistant until it is injected in the source cells.

      Ntn4 has been shown to disrupt both assembling and existing basement membranes (Reuten et al. 2016). Thus, we feel that the midline and endodermal basement membranes’ resistance to degradation is not determined by stage of assembly or location of secretion.

      Have you considered an alternative origin from the bilateral dorsal aorta or the paraxial mesoderm, which would explain the double layer as a meeting of two lateral tissues? The left and right paraxial mesoderm seem to abut in Fig. S1B-C and S2E, and is laminin-positive in Fig 4A'. What are the cells present at the midline (Fig.4D-E)? Are they negative for the coelomic tracing, paraxial or aortic markers?

      We fully agree that alternate origins of the midline basement membrane cannot be ruled out from our existing data. We agree and have considered the dorsal aorta and even the endothelial cords that originate from the dorsal aorta. However, accessing the dorsal aorta for electroporation or other perturbation is extremely difficult. Importantly, the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. Vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in Hallmann et al. 2005). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Note in Fig. 3 E-H that our laminin alpha 1 antibody staining does not label the aortae. Additionally, no fibronectin is found in the midline basement membrane, while it is enriched in the dorsal aorta (see Supplemental Figure 3CC’C’’). We will briefly note that our preliminary data in quail tissue indicates that QH1+ cord cells (i.e. endothelial cells) sometimes exhibit striking contact with the midline along the dorso-ventral length of the DM, suggesting not an origin but an important interaction. Moreover, at the earliest stages of midline basement membrane emergence, the dorsal aortae are distant from the nascent basement membrane, as are the somites, which have not yet undergone any epithelial to mesenchymal transition. Fig. S2G provides an example of an extremely early midline basement membrane without dorsal aorta or somite contact. S2G is from a section of the embryo that is fairly posterior in the embryo, it is thus less developed in pseudo-time and gives a window on midline formation in very early embryos.

      (2) The importance of the midline is inferred from previously published data and stage correlations but will require more direct evidence. Can the midline be manipulated with Hh signaling or MMPs?

      We agree that direct evidence in the form of midline perturbation will be critically required. As previously noted, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. Targeting Hh signaling between the endoderm and notochord is a good idea and we will continue these efforts. Thanks very much.

      Minor comments:

      - Please add the species in the title.

      We have altered the title as follows: “An atypical basement membrane forms a midline barrier during left-right asymmetric gut development in the chicken embryo.”

      - The number of observations in Fig2, Fig3A-B, 4A-C, G-H, S1, S3 is lacking.

      We have added the requested n numbers of biological replicates to the legends of the specified figures.

      - Please annotate Fig 3J to show what is measured in K.

      We have modified Fig. 3J to include a dashed bar indicating the length measurements in Fig. 3K.

      - Please provide illustrations of Fig 4E.

      We have added a representative image of GM130 staining to the supplement.

      - If laminin gamma is the target of Ntn4, its staining would help interpret the results of Ntn4 manipulation. Is laminin gamma present in different proportions in the different types of basement membranes, underlying variations in sensitivity?

      Laminin is exported as a heterotrimer consisting of an alpha, beta, and gamma subunit. Laminin gamma is therefore present in equal proportions to other laminins in all basement membranes with a laminin network. Several gamma isoforms do exist, but only laminin gamma 1 will bind to laminin alpha 1, which we use throughout this paper to mark the midline as well as nearby basement membranes that are sensitive to Ntn4 disruption. Thus, gamma laminin proportions or isoforms are unlikely to underlie the resistance of the midline and endodermal basement membranes to Ntn4 (reviewed in Yurchenco 2011).

      - Please comment: what is the red outline abutting the electroporated DM on the left of Fig5B?

      The noted structure is the basement membrane of the nephric duct – we added this information to Fig. 5B image and legend.

      - The stage in Fig 6A-B is lacking.

      We have added the requested stage information to Fig. 6.

      - Please comment on whether there is or is not some cell mixing Fig 2H, at HH21 after the midline disappearance. Is it consistent with Fig. 6E-F which labels cells?

      More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated dorsal mesentery cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Cell mixing does not occur even after midline disappearance, most likely due to asymmetric N-cadherin expression on the left side of the DM (Kurpios et al., 2008). The sparse, green-labeled cells observed on the right side in Fig. 2H are likely a result of DNA electroporation - the accuracy of this process relies on the precise injection of the left (or right) coelomic cavity (precursor to the gut mesenchyme including the DM) and subsequent correct placement of the platinum electrodes.

      Based on these data, we strongly feel that cellular migration is not responsible for the pattern of dextran observed in Fig. 6E-F, especially in light of the N-cadherin mediated segregation of left and right. We will also note that there is no significant difference between dextran diffusion at HH19 and HH20, only a trend towards significance. Additionally, we would like to note that the dextran-injected embryos were isolated two hours post-injection, which we do not believe is sufficient time for any cell migration to occur across the DM. We also collected additional post-midline stage embryos ten minutes after dextran injections (data not shown), too short a timeframe for significant cellular migration, and the fluorescent signal in those embryos was comparable to that represented in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM observed when the barrier starts to fragment at HH20 and HH23 is unlikely to represent movement of cells.

      To further strengthen this argument, we now have additional new data on midline diffusion using BODIPY and quantification method to support our findings on the midline's function against diffusion (please refer to New Fig. 6H-M). Briefly, we utilized a BODIPY-tagged version of AMD3100 (Poty et al., 2015) delivered via soaked resin beads surgically inserted into the left coelomic cavity (precursor to the DM). The ratio of average AMD3100-BODIPY intensity in the right DM versus the left DM was below 0.5 when the midline is intact (HH19), indicating little diffusion across the DM (Fig. 6J). At HH21 when no midline remains, this ratio significantly rises to near one, indicating diffusion of the drug is not impeded when the midline basement membrane structure is absent. Collectively, these data suggest that the basement membrane structure at the midline forms a transient functional barrier against diffusion.

      - 'independent of Lefty1': rephrase or show the midline phenotype after lefty1 inactivation.

      We agree with this comment and have rephrased this section to indicate the midline is present “at a stage when Lefty1 is no longer expressed at the midline.”

      We again would like to extend our sincere gratitude to our reviewers and the editors at eLife for their dedicated time and thorough evaluation of our paper. Their meticulous attention to detail and valuable insights have strengthened our data and provided further support for our findings.

    1. eLife Assessment

      This potentially useful study introduces an orthogonal approach for detecting RNA modification, without chemical modification of RNA, which often results in RNA degradation and therefore loss of RNA molecules. While the authors have improved the work compared to a previous version, uncertainty regarding false positive and false negative rates leave the evidence for the broad applicability of the method incomplete. If properly validated, the approach might be of particular interest for sites where modifications are rare.

    2. Reviewer #2 (Public review):

      The fledgling field of epitranscriptomics has encountered various technical roadblocks with implications as to the validity of early epitranscriptomics mapping data. As a prime example, the low specificity of (supposedly) modification-specific antibodies for the enrichment of modified RNAs, has been ignored for quite some time and is only now recognized for its dismal reproducibility (between different labs), which necessitates the development of alternative methods for modification detection.

      Furthermore, early attempts to map individual epitranscriptomes using sequencing-based techniques are largely characterized by the deliberate avoidance of orthogonal approaches aimed at confirming the existence of RNA modifications that have been originally identified.

      Improved methodology, the inclusion of various controls, and better mapping algorithms as well as the application of robust statistics for the identification of false-positive RNA modification calls have allowed revisiting original (seminal) publications whose early mapping data allowed making hyperbolic claims about the number, localization and importance of RNA modifications, especially in mRNA. Besides the existence of m6A in mRNA, the detectable incidence of RNA modifications in mRNAs has drastically dropped.

      As for m5C, the subject of the manuscript submitted by Zhou et al., its identification in mRNA goes back to Squires et al., 2012 reporting on >10.000 sites in mRNA of a human cancer cell line, followed by intermittent findings reporting on pretty much every number between 0 to > 100.000 m5C sites in different human cell-derived mRNA transcriptomes. The reason for such discrepancy is most likely of a technical nature. Importantly, all studies reporting on actual transcript numbers that were m5C-modified relied on RNA bisulfite sequencing, an NGS-based method, that can discriminate between methylated and non-methylated Cs after chemical deamination of C but not m5C. RNA bisulfite sequencing has a notoriously high background due to deamination artifacts, which occur largely due to incomplete denaturation of double-stranded regions (denaturing-resistant) of RNA molecules. Furthermore, m5C sites in mRNAs have now been mapped to regions that have not only sequence identity but also structural features of tRNAs. Various studies revealed that the highly conserved m5C RNA methyltransferases NSUN2 and NSUN6 do not only accept tRNAs but also other RNAs (including mRNAs) as methylation substrates, which in combination account for most of the RNA bisulfite-mapped m5C sites in human mRNA transcriptomes. Is m5C in mRNA only a result of the Star activity of tRNA or rRNA modification enzymes, or is their low stoichiometry biologically relevant?<br /> In light of the short-comings of existing tools to robustly determine m5C in transcriptomes, other methods, like DRAM-seq, allowing to map m5C independently of ex situ RNA treatment with chemicals, are needed to arrive at a more solid "ground state", from which it will be possible to state and test various hypotheses as to the biological function of m5C, especially in lowly abundant RNAs such as mRNA.

      Importantly, the identification of >10.000 sites containing m5C increases through DRAM-Seq, increases the number of potential m5C marks in human cancer cells from a couple of 100 (after rigorous post-hoc analysis of RNA bisulfite sequencing data) by orders of magnitude. This begs the question, whether or not the application of these editing tools results in editing artefacts overstating the number of actual m5C sites in the human cancer transcriptome.

      Remaining comments after resubmission:

      (1) The use of two m5C reader proteins is likely a reason for the high number of edits introduced by the DRAM-Seq method. Both ALYREF and YBX1 are ubiquitous proteins with multiple roles in RNA metabolism including splicing and mRNA export. It is reasonable to assume that both ALYREF and YBX1 bind to many mRNAs that do not contain m5C.<br /> To substantiate the author's claim that ALYREF or YBX1 binds m5C-modified RNAs to an extent that would allow distinguishing its binding to non-modified RNAs from binding to m5C-modified RNAs, it would be recommendable to provide data on the affinity of these, supposedly proven, m5C readers to non-modified versus m5C-modified RNAs. To do so, this reviewer suggests performing experiments as described in Slama et al., 2020 (doi: 10.1016/j.ymeth.2018.10.020). Mind you that using dot blots like in so many published studies to show modification-specific antibody or protein binding, is insufficient as an argument because no antibody, nor protein encounters nanograms to micrograms of a specific RNA identity in a cell. This issue remains a major caveat in all studies using so-called RNA modification reader proteins as bait for detecting RNA modifications in epitranscriptomics research and becomes a pertinent problem, if used as a platform for base-editing similar to the work presented in this manuscript.

      (2) Using sodium arsenite treatment of cells as a means to change the m5C status of transcripts through the downregulation of the two major m5C writer proteins NSUN2 and NSUN6 is problematic and the conclusions from these experiments are not warranted. Sodium arsenite is a chemical that poisons every protein containing thiol groups. Not only do NSUN proteins contain cysteines but also the base editor fusion proteins. Arsenite will inactivate these proteins, hence the editing frequency will drop, as observed in the experiments shown in Figure 5, which the authors explain with fewer m5C sites to be detected by the fusion proteins.

      (3) The authors should move high-confidence editing site data contained in Supplementary Tables 2 and 3 into one of the main Figures to substantiate what is discussed in Figure 4A. However, the data needs to be visualized in another way then excel format. Furthermore, Supplementary Table 2 does not contain a description of the columns, while Supplementary Table 3 contains a single row with letters and numbers.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript, Zhou et al describe a deaminase and reader protein-assisted RNA m5C sequencing method. The general strategy is similar to DART-seq for m6A sequencing, but the difference is that in DART-seq, m6A sites are always followed by C which can be deaminated by fused APOBEC1 to provide a high resolution of m6A sites, while in the case of m5C, no such obvious conserved motifs for m5C sites exist, therefore, the detection resolution is much lower. In addition, the authors used two known m5C binding proteins ALYREF and YBX1 to guide the fused deaminases, but it is not clear whether these two binding proteins can bind most m5C sites and compete with other m5C binding proteins.

      Thank you for your kind suggestion. RNA affinity chromatography and mass spectrometry analyses using biotin-labelled oligonucleotides with or without m5C were performed in previous reports (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y), and the results showed that ALYREF and YBX1 had a more prominent binding ability to m5C -modified oligonucleotides. Moreover, these two m5C -binding proteins are also responsible for mRNA m5C binding, so we chose to use their ability to bind targeted m5C to construct a DRAM detection system in anticipation of transcriptome-wide m5C detection. We hope to propose a suitable detection strategy for RNA m5C, and there will certainly be room for optimization of the DRAM system in the future with more in-depth studies of m5C binding proteins. We have discussed the above issue in lines 75-82 and 315-318.

      It is well known that two highly modified m5C sites exist in 28S RNA and many m5C sites exist in tRNA, the authors should validate their methods first by detecting these known m5C sites and evaluate the possible false positives in rRNA and tRNA.

      Thank you for your kind suggestion. We attempted PCR amplification of sequences flanking m5C sites 3782 and 4447 on 28S rRNA, as well as multiple m5C sites on tRNA, including m5C48 and m5C49 on tRNAVal, m5C48 and m5C49 on tRNAAsp, and m5C48 on tRNALys.

      However, Sanger sequencing revealed no valid mutations, which was implemented in Figure S3. We believe this outcome indicates that the DRAM system is more suited for transcriptome-wide m5C detection of mRNAs. This is supported by current reports that ALYREF and YBX1 are responsible for the m5C-binding proteins of mRNAs (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y). The above results and descriptions were added to lines 136-143.

      In mRNA, it is not clear what is the overlap between the technical replicates. In Figures 4A and 4C, they detected more than 10K m5C sites, and most of them did not overlap with sites uncovered by other methods. These numbers are much larger than expected and possibly most of them are false positives.

      Thank you for your kind suggestion. We observed significant overlap between the technical repeats by comparing the data across biological repeats, as shown in Figure S4C and described in lines 174-175. We considered m5C modification in a region only when editing events were detected in at least two biological replicates, ensuring a high-stringency screening process (details seen in the revised method in lines 448-455 and Figure 3F). With more in-depth research into m5C readers, we aim to achieve more accurate detection in the future.

      Besides, it is not clear what is the detection sensitivity and accuracy since the method is neither single base resolution nor quantitative.

      Thank you for your suggestion. As shown in Figure 3G, we found that the editing window of the DRAM system exhibited enrichment of approximately 20 bp upstream and downstream of the m5C site. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x). This limitation complicates single-base resolution analysis by the DRAM system. Nevertheless, we believe that with further exploration of m5C sequence features, precise single-base resolution detection can be achieved in the future. This point is also discussed in lines 314-322.

      Regarding the quantitative level of the assay, we conducted additional experiments by progressively reducing the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished (Figure S9). These findings suggest that the DRAM system's transfection efficiency is concentration-dependent and that the ratio of editing efficiency to transfection efficiency could aid in the quantitative analysis of m5C using the DRAM system. The relative results were supplemented in Figure S9 and discussed in lines 263-271.

      There are no experiments to show that the detected m5C sites are responsive to the writer proteins such as NSUN2 and NSUN6, and the determination of the motifs of these writer proteins.

      Thank you for your kind suggestion. We have performed a motif enrichment analysis based on the sequences spanning 10 nt upstream and downstream of DRAM-editing sites. The relative results of this analysis were supplemented in Figure S4D and lines 168-171. Unfortunately, we did not identify any clear sequence preferences for the m5C sites catalyzed by the methyltransferases NSUN2 and NSUN6, which have previously been associated with “G”-rich sequences and the “CUCCA” motif. This limitation is mainly due to the DRAM detection system’s inability to achieve single-base resolution for m5C detection, which is also explained in the above response.

      Reviewer #2:

      (1) The use of two m5C reader proteins is likely a reason for the high number of edits introduced by the DRAM-Seq method. Both ALYREF and YBX1 are ubiquitous proteins with multiple roles in RNA metabolism including splicing and mRNA export. It is reasonable to assume that both ALYREF and YBX1 bind to many mRNAs that do not contain m5C.

      To substantiate the author's claim that ALYREF or YBX1 binds m5C-modified RNAs to an extent that would allow distinguishing its binding to non-modified RNAs from binding to m5C-modified RNAs, it would be recommended to provide data on the affinity of these, supposedly proven, m5C readers to non-modified versus m5C-modified RNAs. To do so, this reviewer suggests performing experiments as described in Slama et al., 2020 (doi: 10.1016/j.ymeth.2018.10.020). However, using dot blots like in so many published studies to show modification of a specific antibody or protein binding, is insufficient as an argument because no antibody, nor protein, encounters nanograms to micrograms of a specific RNA identity in a cell. This issue remains a major caveat in all studies using so-called RNA modification reader proteins as bait for detecting RNA modifications in epitranscriptomics research. It becomes a pertinent problem if used as a platform for base editing similar to the work presented in this manuscript.

      We thank the reviewer for the valuable suggestion. Previous studies have shown that while ALYREF and YBX1 can bind mRNAs without the m5C modification, their binding affinity for m5C-modified oligonucleotides is significantly higher than for unmethylated controls. This has been demonstrated through experiments such as in vitro tractography, electrophoretic mobility shift assay (EMSA) (doi:10.1038/cr.2017.55), and UHPLC-MRM-MS/MS. Additionally, isothermal titration calorimetry measurements and PAR-CLIP experiments have shown that mutations in the key amino acids responsible for m5C binding in ALYREF and YBX1 result in a significant reduction in their ability to m5C (doi: 10.1038/s41556-019-0361-y).

      Although Me-RIP analysis was unsuccessful in our laboratory, likely due to the poor specificity of the m5C antibody, we alternatively performed RNA pulldown experiments. These experiments verified that the ability of DRAMmut-expressing proteins to bind RNA with m5C modification was virtually absent compared to DRAM-expressing proteins, while their binding ability with non-modified RNA was not significantly affected. The relative RNA pulldown results were supplemented in Figure S1E, S1F and lines 110-111. Therefore, we believe that by integrating DRAMmut group, our DRAM system could effectively exclude the false-positive mutations caused by unspecific binding of DRAM’s reader protein to non-m5C-modified mRNAs.

      (2) Since the authors use a system that results in transient overexpression of base editor fusion proteins, they might introduce advantageous binding of these proteins to RNAs. It is unclear, which promotor is driving construct expression but it stands to reason that part of the data is based on artifacts caused by overexpression. Could the authors attempt testing whether manipulating expression levels of these fusion proteins results in different editing levels at the same RNA substrate?

      Thank you for pointing this out. To investigate how different expression levels of these proteins influence A-to-G and C-to-U editing within the same m5C region, we conducted a gradient transfection using plasmid concentrations of 1500 ng, 750 ng and 300 ng. This approach allowed us to progressively reduce the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished. These findings suggest that the transfection efficiency of the DRAM system is concentration-dependent and that the ratio of editing efficiency to transfection efficiency may assist in the quantitative analysis of m5C using the DRAM system. The relative results and hypotheses were added and discussed in Figure S9 and lines 263-271 of the revised manuscript.

      (3) Using sodium arsenite treatment of cells as a means to change the m5C status of transcripts through the downregulation of the two major m5C writer proteins NSUN2 and NSUN6 is problematic and the conclusions from these experiments are not warranted. Sodium arsenite is a chemical that poisons every protein containing thiol groups. Not only do NSUN proteins contain cysteines but also the base editor fusion proteins. Arsenite will inactivate these proteins, hence the editing frequency will drop, as observed in the experiments shown in Figure 5, which the authors explain with fewer m5C sites to be detected by the fusion proteins.

      Thank you for pointing this out. We used bisulfite sequencing PCR to determine that the m5C levels in RPSA and AP5Z1 were significantly reduced after sodium arsenite treatment. This was followed by a significant decrease in editing frequency detected by the DRAM system in sodium arsenite-treated samples compared to untreated samples. This reduction aligns with the decreased editing efficiency observed in methyltransferase-deficient cells (as shown in Figures 2G and 2H), which initially convinced us that these results reflected the DRAM system's ability to monitor dynamic changes in m5C levels.

      However, as the reviewer pointed out, sodium arsenite treatment could potentially inactivate the fusion proteins, leading to the observed reduction in editing efficiency. This possibility has not been conclusively ruled out in our current experiments. Optimizing this validation may require the future development of more specific m5C inhibitors. In light of this, we have revised our previous results and conclusions in lines 235-244, and discussed these points in lines 308-315.

      (4) The authors should move high-confidence editing site data contained in Supplementary Tables 2 and 3 into one of the main Figures to substantiate what is discussed in Figure 4A. However, the data needs to be visualized in another way than an Excel format. Furthermore, Supplementary Table 2 does not contain a description of the columns, while Supplementary Table 3 contains a single row with letters and numbers.

      Thank you for your kind suggestion. We have visualized the data from Supplementary Tables 2 and 3 into Figure 3F, presenting it as a screening flowchart for high-confidence editing sites. In Supplementary Table 3, we have displayed only the DRAM-mutated genes, which is why it contains a single row with letters and numbers. As requested, we have included descriptions of each column and reorganized the Supplementary table 2 and 3 accordingly.

      (5) The authors state that "plotting the distribution of DRAM-seq editing sites in mRNA segments (5'UTR, CDS, and 3'UTR) highlighted a significant enrichment near the initiation codon (Figure 3F).", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion, and we replaced the expression of " near the initiation codon" with "in the CDS" in lines 192-193.

      (6) The authors state that "In contrast, cells expressing the deaminase exhibited a distinct distribution pattern of editing sites, characterized by a prevalence throughout the 5'UTR.", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion. This distribution was actually characterized by a prevalence throughout the "3'UTR", but not "5'UTR". We have also made the necessary changes in lines 193-195.

      (7) The authors claim in the final conclusion: "In summary, we developed a novel deaminase and reader protein assisted RNA m5C methylation approach...", which is not what the method entails. The authors deaminate As or Us close to 5mC sites based on the binding of a deaminase-containing protein.

      Thank you for your kind suggestion, and we have made the necessary changes in lines 331-334.

      (8) The authors claim that "The data supporting the findings of this study are available within the article and its Supplementary Information." However, no single accession number for the deposited sequencing data can be found in the text or the supplementary data. Without the primary data, none of the claims can be verified.

      Thank you for pointing this out. The sequencing data from this study has already been deposited to the GEO database (GEO assession number: GSE254194, GEO token:ororioukbdqtpcn), and we will ensure it is made publicly available in a timely manner.

      (a) To underscore point (1), a recent publication (https://doi.org/10.1038/s41419-023-05661-y) reported: "To further identify the potential mRNAs regulated by ALYREF, we performed RNA-seq analysis in control or ALYREF knockdown T24 cells. After knockdown of ALYREF, 143 mRNAs differentially expressed, including 94 downregulated mRNAs (NC reads >100, |Fold change | >1.5, P-value <0.05). Functional enrichment analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG) indicated that regulated mRNAs by ALYREF are chiefly enriched in canonical cancer-related pathways (Fig. S4A), including TGF-β signaling, MAPK signaling, and NF-κB signaling, strongly supporting the oncogenic function of ALYREF in tumor progression. Among these 94 downregulated genes, 11 mRNA showed a significant reduction in m5C methylation after NUSN2 silencing in T24 cells, combined with previously transcriptome-wide RNA-BisSeq data of T24 cells [21] (Fig. 4A)."

      These results translate into 94 mRNAs are regulated by ALYREF in bladder cancer-derived cells. From those, very few (11) mRNA identities respond to NSUN2-dependent RNA methylation mediated by ALYREF binding.The question then arises, is that number sufficient to claim that ALYREF is a m5C-binding protein?

      And if so, how does the identification of 10.000+ edits by DRAM-Seq compare with the 94 mRNAs that are regulated by ALYREF? Were these 94 mRNAs identified by DRAM-Seq.

      Thank you for your kind suggestion. Previous reports by Yang et al. ( doi: 10.1038/cr.2017.55), including the literature you refer to, have detailed the close relationship between ALYREF and m5C modification, and the ALY/REF export factor (ALYREF) was identified as the first nuclear m5C reader, and it was demonstrated that many mRNAs are regulated by ALYREF, and is therefore considered to be an m5C-binding protein.

      As required, by comparing the DRAM-edited mRNAs with the reported 94 mRNAs, we found that only 55.32% of the 94 mRNAs regulated by ALYREF could be detected by the DRAM system. This indicates that the DRAM system specifically targets certain mRNAs, as illustrated in Figure S4E. The relevant results were described and discussed in lines 175-179.

      (b) Line 123:

      "The deep sequencing results showed that the deamination rates of RPSA and SZRD1 were 75.5% and 27.25%, respectively. (Fig. 2A, B)."

      The Figure shows exactly the opposite of bisulfite-mediated deamination. These are the cytosines that were not deaminated by the chemical treatment and therefore can be sequenced as cytosines and not thymidines. Hence, the term deamination rate is wrong.

      Thank you for your kind suggestion. We have made the necessary change in lines 129-130 to change the deamination rates to m⁵C fraction.

      (c) Line 157:

      "DRAM-seq analysis further confirmed that DRAM was detected in an m5C-dependent manner, with minimal mutations in AP5Z1 and RPSA mRNAs in methyltransferase knockout cells compared to wild-type cells (Fig. 3C, D)."

      There is no indication of what the authors mean by minimal mutation in these Figures. The term "minimal mutation" should be reconsidered as well.

      Thank you for your kind suggestion. We intended to express that "Mutations in AP5Z1 and RPSA mRNA are reduced in methyltransferase-deficient cells." There was an issue with the initial formulation, and we have made the necessary changes in lines 165-167.

      (d) Line 167:

      "To further delineate the characteristics of the DRAM-seq data, we compared the distribution of DRAM-seq editing sites within the gene structure, specifically examining their occurrences in the 5'untranslated region (5'UTR), 3' untranslated region (3'UTR), CDS and ncRNA."

      Which part of a coding RNA is meant by "ncRNA"?

      Thank you for pointing this out. This was actually the Intergenic or Intron region, but not ncRNA. We have also corrected this labelling in Figure 3G and lines 186-189 of the revised manuscript.

      (e) Line 189:

      "Subsequently, we assessed the capacity of DRAM-seq to detect m5C on a transcriptome-wide scale, comparing its performance to BS-seq that have been previously reported with great authority."

      The term "great authority" is not a scientific term. Please, remove adulation to senior authors.

      Thank you for your kind suggestion. We removed this unsuitable expression and made the necessary changes in lines 207-208.

      (f) Line 233:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing required half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (g) Line 247:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing requires half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (h) Line 292:

      "Since m5C lacks a fixed motif, DRAM has an apparent limitation in achieving single-base resolution for detecting m5C."

      m5C deposition by NSUN2 and NSUN6 occurs in particular motifs that were coined Type I and II motifs. Hence, this statement is not correct.

      Thank you for your kind suggestion. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x ). Therefore, we have corrected the expression “fixed motif” to “fixed base composition for characterizing all m5C modification sites” in lines 317.

      (i) Line 390:

      "1 μl of total cellular RNA was used for sequencing library gene..."

      1 uL does not allow us to deduce which RNA mass was used for cDNA synthesis.

      Thank you for your kind suggestion. According to our cDNA synthesis protocol, we corrected “1μl” to “1μg” in lines 422-423.

      (j) Line 405:

      "...was assessed on the Agilent 5400 system (Agilent, USA) and quantified by QPCR (1.5 nM)"

      What does the 1.5 nM refer to in this sentence?

      Thank you for your kind suggestion. Here, "1.5nM" means that the concentration of the constructed library should be no less than 1.5nM. We have also revised this expression in the methods in lines 436-438.

    1. eLife Assessment

      This interesting study focuses on a previously reported positive correlation between translational efficiency and protein noise. This is unexpected as typically noise is inversely related to expression and increasing translation efficiency would increase the protein expression and thus be expected to reduce noise in gene expression. Using mathematical modeling and analysis of experimental data the authors argue that this phenomenon arises due to ribosomal demand. However, the work appears incomplete, with the reviewers having raised questions regarding the validity of the assumptions used in the mathematical model as well as the clarity of the presentation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise.

    3. Reviewer #2 (Public review):

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Major comments:

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      (4) It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below.

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain more or less the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of the mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters ‘mRNA-curr’ and ‘mRNA-prev’ are the mRNA copy numbers at the current time point and the previous time point in the stochastic simulation, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the burst frequency and the burst size, as well as the rate of mRNA removal. We would expand this section with explanation for all parameters and terms in the revised manuscript.

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise.

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered.

      Reviewer #2 (Public review):

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below.

      Major comments:

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, genome-wide analysis of expression noise in yeast also revealed that the association between protein noise and translational efficiency was highest in the group of genes with the most bursty transcription (Supplementary fig. S20).

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      Although we agree with the reviewer’s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, it has been observed in studies across bacteria, yeast and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the strength of the association, but to understand the basis of the influence of translational efficiency on protein noise.

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We will revise the figure captions to include more details as per the reviewer’s suggestion.

      (4) It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.

      For all published datasets where we had measurements from a large number of genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). For experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible. Translational efficiency refers to translation rate which is determined by both the translation initiation rate and the translation elongation rate. The noise at the protein level was quantified from the signal intensity of GFP tagged proteins, which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells.

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they are not new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models.

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a baseline initiation rate depending on the mRNA numbers and other variables. We changed the baseline initiation rate to alter the mean protein expression levels. We will elaborate this section in the revised manuscript.

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description in the model (Fig. 3D) that the changes in the translation initiation rate was also linked with changes in the translation elongation rate. The translation initiation rate can only increase if the ribosomes already bound to the mRNA traverse quicker through the mRNA. This means that an increase in the translation initiation rate will occur only if the translation elongation rate is also increased, which will lead to lower traversal time of the ribosomes through the mRNA (Fig. 3D). Similarly, an increase in the translation elongation rate will allow more ribosomes to initiate translation. Thus, the parameters translation initiation rate and translation elongation rate are interconnected. This has also been observed in an experimental study by Barrington et al. (2023). Having said that, however, the models can also be expressed in terms of the translation elongation rate, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.  

      References

      C. L. Barrington, G. Galindo, A. L. Koch, E. R. Horton, E. J. Morrison, S. Tisa, T. J. Stasevich, O. S. Rissland. Synonymous codon usage regulates translation initiation. Cell Rep. 42, 113413 (2023).

      W. J. Blake, M. Kaern, C. R. Cantor, J. J. Collins, Noise in eukaryotic gene expression. Nature 422, 633-637 (2003).

      P. M. Caveney, S. E. Norred, C. W. Chin, J. B. Boreyko, B. S. Razooky, S. T. Retterer, C. P. Collier, M. L. Simpson, Resource Sharing Controls Gene Expression Bursting. ACS Synth Biol. 6, 334-343 (2017)

      J. R. Newman, S. Ghaemmaghami, J. Ihmels, D. K. Breslow, M. Noble, J. L. DeRisi, J. S. Weissman, Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441, 840-846 (2006).

      E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, A. van Oudenaarden, Regulation of noise in the expression of a single gene. Nat Genet. 31, 69-73 (2002).

      O. K. Silander, N. Nikolic, A. Zaslaver, A. Bren, I. Kikoin, U. Alon, M. Ackermann, A genome-wide analysis of promoter-mediated phenotypic noise in Escherichia coli. PLoS Genet. 8, e1002443 (2012).

      H. W. Wu, E. Fajiculay, J. F. Wu, C. S. Yan, C. P. Hsu, S. H. Wu, Noise reduction by upstream open reading frames. Nat Plants. 8, 474-480 (2022).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review)

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data, however, there are some issues that need to be addressed.

      Weaknesses:

      Major Points:

      (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?

      Thank you for your kind suggestions. We will add a detailed description of the knockout strategy in the legends for Figure 1A and 1B, as shown below:

      Figure 1A. Schemes of mKO2-labeled Oct4 KO (Oct4mKO2) and Oct4 flox alleles. In the Oct4mKO2 allele, a PGK-pac∆tk-P2A-mKO2-pA cassette was inserted 3.6 kb upstream of the Oct4 transcription start site (TSS) and a promoter-less FRT-SA-IRES-hph-P2A-Venus-pA cassette was inserted into Oct4 intron 1. The inclusion of a stop codon followed by three sets of polyadenylation signal sequences (pA) after the Venus cassette ensures both transcriptional and translational termination, effectively blocking the expression of Oct4 exons 2–5.

      Figure 1B. Schemes of EGFP-labeled Sox2 KO (Sox2EGFP) and Sox2 flox alleles. In the Sox2EGFP allele, the 5’ untranslated region (UTR), coding sequence and a portion of the 3’ UTR of Sox2 were deleted and replaced with a PGK-EGFP-pA cassette. Notably, 1,023 bp of the Sox2 3’UTR remaine intact.

      (2) Is ZP 3-Cre expressed in the zygotes? Is there any residual protein?

      Thank you for the question. While we have not directly tested for ZP3-Cre expression in zygotes, the published transcriptome and proteomics data shows that ZP3 is present at both the transcriptional and protein levels in wild-type zygotes (Deng et al., Science, 2014; Gao et al., Cell Reports, 2017). This suggests that ZP3-Cre could potentially be expressed in zygotes as well.

      (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?

      Thank you for the question. The enriched motifs in the rising ATAC-seq peaking in Oct4 KO and Sox2 KO ICMs are the GATA, TEAD, EOMES and KLF motifs, as shown in Figure 4A and Figure supplement 7.

      (4) The ordinate of Fig4c is lost.

      Thank you for the question. The y-axis is average normalized signals (reads per million-normalized pileup signals). We will add it in the revised version.

      (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to conduct this analysis.

      (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfering with both genes have on gene expression and chromatin accessibility?

      Thank you for the interesting question. Unfortunately, we have not conducted this specific experiment, so we do not have direct results. However, Sap30 is a key component of the mSin3A corepressor complex, while Uhrf1 regulates the establishment and maintenance of DNA methylation. Both proteins are known to function as repressors. Therefore, we hypothesize that interfering with these two genes could alleviate repression of some genes, such as trophectoderm markers, similar to what we have observed in Oct4 KO and Sox2 KO ICMs.

      Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Major Points:

      (1) Although the authors claim that both maternal KO and maternal KO/zygotic hetero KO mice develop normally, the molecular changes in these groups appear overestimated. A wildtype control is recommended for a more robust comparison.

      Thank you for your valuable feedback. However, I’m unclear on what is meant by “the molecular changes in these groups appear overestimated.” Could the reviewer kindly provide more details or clarify which specific aspects of the molecular changes they are referring to? This would help us better address the concern.

      (2) The authors assert that OCT4 and SOX2 activate the pluripotent network via the OCT-SOX enhancer. However, the definition of this enhancer is based solely on proximity to TSSs, which is a rough approximation. Canonical enhancers are typically located in intronic and intergenic regions and marked by H3K4me1 or H3K27ac. Re-analyzing enhancer regions with these standards could be beneficial. Additionally, the definitions of "close to" or "near" in lines 183-184 are unclear and not defined in the legends or methods.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to address the concern of “enhancer”.

      The definition of "close to" or "near" in lines 183-184 is in the legend of Figure 2E and methods. In the GSEA analysis, Ensembl protein-coding genes with TSSs located within 10 kb of ATAC-seq peak centers were included.

      (3) There is no evidence that the decreased peaks/enhancers could be the direct targets of Oct4 and Sox2 throughout this manuscript. Figures 2 and 4 show only minimal peak annotations related to OCT and SOX motifs, and there is a lack of chromatin IP data. Therefore, claims about direct targets are not substantiated and should be appropriately revised.

      Thank you for the comment. In Figure Supplement 3C, we analyzed published Sox2 CUT&RUN data from E4.5 ICMs (Li et al., Science, 2023), which demonstrates that the reduced ATAC-seq peaks in our Sox2 KO ICMs are enriched with Sox2 CUT&RUN signals. This data suggests that decreased peaks/enhancers could be the direct targets of Sox2. Unfortunately, we did not to find similar published data for Oct4 in embryos.

      (4) Lines 143-146 lack direct data to support the claim. Actually, the main difference in cluster 1, 11 and 3, 8, 14 is whether the peak contains OCT-SOX motif. However, the reviewer cannot get any information of peaks activated by OCT4 rather than SOX2 in cluster 1, 11.

      Thank you for the comment. As the reviewer pointed out, we agree that clusters 3, 8, 14 is more enriched with OCT-SOX motifs than clusters 1/11. However, this is consistent with our observation that the accessibility of peaks in clusters 1 and 11 mainly relies on Oct4, while the accessibility of clusters 3, 8, 14 relies on both Oct4 and Sox2. Probably the word “activate” is not accurate. We will rearrange the texts as below:

      “Notably, compared to the peaks dependent on Oct4 but not Sox2 (Figure 2B, clusters 1 and 11), those reliant on both Oct4 and Sox2 show greater enrichment of the OCT-SOX motif (Figure 2B, clusters 3, 8 and 14). The former group tended to be already open in the morula, while the latter group became open in the ICM. “

      Minor Points:

      (1) Lines 153-159: The figure panel does not show obvious enrichment of SOX2 signals or significant differences in H3K27ac signals across clusters, thus not supporting the claim.

      Thank you for the comments.

      Line 153-159 reference two datasets:  Figure supplement 3C and 3D.

      In Figure supplement 3C, the average plots above the heatmaps show that the decreased ATAC-seq peaks exhibited higher enrichment with Sox2 CUT&RUN signals compared to the increased or unchanged peaks.

      Regarding Figure supplement 3D, we agree that the H3K27ac signal is only slightly more enriched on the decreased peaks than the unchanged peaks, However, it's important to note that only the top 57,512 strongest of the 142,096 unchanged peaks were included in the analysis. We excluded the weaker unchanged peaks because they are less informative. but if included, they could reduce the average H3K27ac signal for the unchanged peaks.

      (2) Lines 189-190: The term "identify" is overstated for the integrative analysis of RNA-seq and ATAC-seq, which typically helps infer TF targets rather than definitively identifying them.

      Thank you for the suggestion. We will replace “identify” with “infer”. The revised version is as below:

      “In addition, integration of the ATAC-seq and RNA-seq data allowed us to infer previously unknown targets of Oct4 and Sox2, such as Sap30 and Uhrf1, which are essential for somatic cell reprogramming and embryonic development.”

      (3) The Discussion is lengthy and should be condensed.

      Thank you for the suggestion. We will shorten it.

    2. eLife Assessment

      This work presents a valuable finding on how the interplay between transcription factors SOX2 and OCT4 establishes the pluripotency network in early mouse embryos. Despite the high quality of the data, the evidence supporting the claims of the authors is currently incomplete and would benefit from more omics analysis such as H3K4me1 and H3K27ac CUT&Tag. The work will be of interest to biologists working on embryonic development.

    3. Reviewer #1 (Public review):

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data, however, there are some issues that need to be addressed.

      Weaknesses:

      Major Points:

      (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?

      (2) Is ZP 3-Cre expressed in the zygotes? Is there any residual protein?

      (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?

      (4) The ordinate of Fig4c is lost.

      (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.

      (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfering with both genes have on gene expression and chromatin accessibility?

    4. Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Major Points:

      (1) Although the authors claim that both maternal KO and maternal KO/zygotic hetero KO mice develop normally, the molecular changes in these groups appear overestimated. A wildtype control is recommended for a more robust comparison.

      (2) The authors assert that OCT4 and SOX2 activate the pluripotent network via the OCT-SOX enhancer. However, the definition of this enhancer is based solely on proximity to TSSs, which is a rough approximation. Canonical enhancers are typically located in intronic and intergenic regions and marked by H3K4me1 or H3K27ac. Re-analyzing enhancer regions with these standards could be beneficial. Additionally, the definitions of "close to" or "near" in lines 183-184 are unclear and not defined in the legends or methods.

      (3) There is no evidence that the decreased peaks/enhancers could be the direct targets of Oct4 and Sox2 throughout this manuscript. Figures 2 and 4 show only minimal peak annotations related to OCT and SOX motifs, and there is a lack of chromatin IP data. Therefore, claims about direct targets are not substantiated and should be appropriately revised.

      (4) Lines 143-146 lack direct data to support the claim. Actually, the main difference in cluster I, 11 and 3, 8, 14 is whether the peak contains OCT-SOX motif. However, the reviewer cannot get any information of peaks activated by OCT4 rather than SOX2 in cluster I, 11.

      Minor Points:

      (1) Lines 153-159: The figure panel does not show obvious enrichment of SOX2 signals or significant differences in H3K27ac signals across clusters, thus not supporting the claim.

      (2) Lines 189-190: The term "identify" is overstated for the integrative analysis of RNA-seq and ATAC-seq, which typically helps infer TF targets rather than definitively identifying them.

      (3) The Discussion is lengthy and should be condensed.

    1. eLife Assessment

      This valuable study provides a detailed picture of the synapse distributions for a set of visual projection neurons and their downstream partners, in combination with multi-compartmental modelling fitted to electrophysiological data. The model reveals interesting consequences of synapse topography for neuronal computation. The analysis, however, seems incomplete as the authors only analyze passive models of these spiking neurons, and do not attempt to connect their analysis to the bigger picture at the behavioral level.

    2. Reviewer #1 (Public review):

      Summary:

      This study makes use of the EM reconstruction of the fly brain to investigate the morphology and topography of the synapses between retinotopic, loom-sensitive visual projection neurons (VPNs) and downstream descending neurons (DNs). The authors analyzed the distribution of synapses on the dendritic trees of DNs and performed multi-compartmental modelling to study the implications of the synaptic arrangements for neuronal integration of input signals.

      Until recently, it has been unclear how spatial information is passed from retinotopic loom-sensitive neurons to descending neurons because the axons of the VPNs terminate in small optic glomeruli with no apparent topographic organization. It has recently been shown that synaptic weight gradients of VPNs connecting to DNs are the main mechanisms that allow for directed behavioral output (Dombrovski et al.). This study now goes one step further to determine if precise synapse location on the dendritic tree contributes further to the information processing. The study suggests that (1) none of the VPNs investigated show a retinotopic organization of synapses on DN dendrites. (2) Synapses of single VPNs are locally clustered. (3) Initial EPSPs at the synaptic location have, as expected, varying amplitudes but the amplitudes are passively normalized and only cover a small range when measured at the SIZ. (4) A near random distribution of synapses allows for linear integration of synaptic inputs when only a few VPNs are activated.

      Strengths:

      This study provides a detailed picture of the synapse distribution for a set of VPN and DN pairs, in combination with multi-compartmental modelling fitted to electrophysiological data. The data and methods are clear. The findings are overall interesting. The computational pipeline, which should ideally be made publicly available, will allow the community to make similar analyses on different neuronal classes, which will facilitate the detection of more general mechanisms of dendritic computation.

      Weaknesses:

      - In my opinion, we need more detail on the electrophysiological data and the fitting of the multi-compartmental model, which is the foundation of large parts of the study.<br /> - The study shows that the synapses of an individual VPN are locally clustered and suggests this as evidence for clustering of synapses of similar tuning (as has been shown previously in other systems). I am not fully convinced by the arguments here, since synapses of a single neuron are by necessity not randomly distributed in space.<br /> - As written, it was in parts unclear to me what the main hypotheses and conclusions were - e.g., how would a retinotopic distribution of synapses on dendritic trees contribute to information processing? Are the model predictions in line with the presumed behavioural role of these neurons?

    3. Reviewer #2 (Public review):

      Summary:

      This article investigates the distribution of synapses on the dendritic arbors of descending neurons in the looming circuit of the fly visual system. The authors use publicly available EM reconstruction data of the adult fly brain to identify the positions of synapses from several types of visual projection neuron (VPN) to descending neuron (DN) connections. VPN dendrites are retinotopically organized, and axons from different VPN populations innervate distinct optic glomeruli. Yet the authors did not find any retinotopic organization of the synapses in the VPN-DN pairs they analyzed. They then constructed passive electrical models of the DNs with their structures extracted from the EM reconstructions. They focused on two specific DNs and parameterized their models by conducting whole-cell recordings within a voltage range below spiking threshold. Simulation of these passive models showed that irrespective of the location of a synapse, EPSPs became very similar at the spike initiation zone. This is consistent with the idea of synaptic democracy where EPSPs at far away synapses have higher amplitude compared to those nearer to the spike initiation zone so that they all attenuate to similar amplitudes while reaching there. The authors found that activating synapses from individual VPNs have the same effect as activating a random set of synapses. They conclude that despite some clustering of VPN synapses at small scale, they are distributed randomly over the dendritic arbor of DNs so that their EPSP amplitude encode the number of activated synapses, avoiding sublinearity from shunting effect.

      Strengths:

      - Experimental confirmation of the location of the spike initiation zone in the DN arbors is interesting and may provide better understanding of signal processing in these neurons.<br /> - Passive parameters obtained through electrophysiological recordings are useful.<br /> - These morphologically detailed single neuron models, if made available publicly, will be beneficial for building more complete models to understand the fly visual circuit.<br /> - The authors have complemented the work of Dombrovski et al by analyzing the distribution of synapses in more detail from EM data for a different set of neurons.

      Weaknesses:

      DNs are upstream of motorneurons, and one would expect, as demonstrated by Dombrovski et al, that specific DNs being activated by input from specific regions of the visual field will activate motoneurons so that the fly moves away from a looming object.

      The current work analyzed the synapse distribution on two DNs that do not seem to have such role, and emphasize the lack of retinotopy. However, it is not clear why one would expect retinotopy in synapse location on the dendritic arbor. The comparison with mammalian visual circuits is not appropriate because those layers are extracting more and more complex visual features, whereas Drosophila DNs are supposed to drive motoneurons to generate suitable escape behavior.

      - The authors do not suggest the functional roles of these DNs in controlling the movement of the fly. They argue that the synapse distribution and the passive electrotonic structure of these neurons are optimized to make the composite EPSP encode the number of activated synapses, but do not explain why this is important.

      - Although DNs are spiking neurons, the authors limit their work to the subthreshold passive domain. If the EPSP at the spike initiation zone crosses spiking threshold, will encoding the number of synapses in EPSP amplitude still matter? Will it matter either if the composite EPSP remains subthreshold?

      - The temporal aspect of the input has been ignored by the authors in their simulations. First, it is not clear all the synapses from a single VPN should get activated together. One would expect a spike in a VPN to arrive at different synapses with different time delays depending on their electrotonic distance from the spike initiation zone and the signal propagation speed in the neurites.

      A looming stimulus should be expanding with time, but from the description of the simulations it does not seem that the authors have tried to incorporate this aspect in their design of the synaptic activation.

      - The suggestion in the abstract that linear encoding of synapse number is default strategy which is then tuned by active properties and plasticity seems strange. Developmentally active properties do not get inserted into passive neurons.

      - Much of the analysis (Figures 4, 5, 12) show relationships with physical distance along dendrite. In studying passive neurons it is more informative to use electrotonic distance which provides better insight.