10,000 Matching Annotations
  1. Sep 2024
    1. Reviewer #1 (Public review):

      Summary:

      Deletion of the hrp2 and hrp3 loci in P. falciparum poses an immediate public health threat. This manuscript provides a more complete understanding of the dynamic nature with which these deletions are generated. By delving into the likely mechanisms behind their generation, the authors also provide interesting insight into general Plasmodium biology that can inform our broader understanding of the parasite's genomic evolution.

      Strengths:

      The sub-telomeric regions of P. falciparum (where hrp2 and hrp3 are located) are notoriously difficult to study with short-read sequence data. The authors take an appropriate, targeted approach toward studying the loci of interest, which includes read-depth analysis and local haplotype reconstruction. They additionally use both long-read and short-read data to validate their major findings. There is an extensive set of supplementary plots, which helps clarify several aspects of the data.

      Weaknesses:

      The revised version of this manuscript has helpfully expanded the details regarding methodology, however, publication of the tool PathWeaver (which is used for local haplotype reconstruction) remains in preparation.

    2. Reviewer #2 (Public review):

      This work investigates the mechanisms, patterns and geographical distribution of pfhrp2 and pfhrp3 deletions in Plasmodium falciparum. Rapid diagnostic tests (RDTs) detect P. falciparum histidine-rich protein 2 (PfHRP2) and its paralog PfHRP3 located in subtelomeric regions. However, laboratory and field isolates with deletions of pfhrp2 and pfhrp3 that can escape diagnosis by RDTs are spreading in some regions of Africa. They find that pfhrp2 deletions are less common and likely occurs through chromosomal breakage with subsequent telomeric healing. Pfhrp3 deletions are more common and show three distinct patterns: loss of chromosome 13 from pfhrp3 to the telomere with evidence of telomere healing at breakpoint (Asia; Pattern 13-); duplication of a chromosome 5 segment containing pfhrp1 on chromosome 13 through non-allelic homologous recombination (NAHR) (Asia; Pattern 13-5++); and the most common pattern, duplication of a chromosome 11 segment on chromosome 13 through NAHR (Americas/Africa; Pattern 13-11++). The loss of these genes impact the sensitivity od RDTs, and knowing these patterns and geographic distribution makes it possible to make better decisions for malaria control.

      Comments on latest version:

      The authors answered all my questions.

    3. Reviewer #3 (Public review):

      The study provides a detailed analysis of the chromosomal rearrangements related to the deletions of histidine-rich protein 2 (pfhrp2) and pfhrp3 genes in P. falciparum that have clinical significance since malaria rapid diagnostic tests detect these parasite proteins. A large number of publicly available short sequence reads for whole-genome of the parasite were analyzed and data on coverage and on discordant mapping allowed to identify deletions, duplications and chromosomal rearrangements related to pfhrp3 deletions. Long-read sequences showed support for the presence of a normal chromosome 11 and a hybrid 13-11 chromosome lacking pfhrp3 in some of the pfhrp3-deleted parasites. The findings support that these translocations have repeatedly occurred in natural populations. The authors discuss the implications of these findings and how they support or not previous hypothesis on the emergence of these deletions and the possible selective pressures involved.

      The genomic regions where these genes are located are challenging to study since they are highly repetitive and paralogous and the use of long read sequencing allowed to span the duplicated regions, giving support to the identification of the hybrid 13-11 chromosome.

      All publicly available whole-genome sequences of the malaria parasite from around the world were analysed which allowed an overview of the worldwide variability, even though this analysis is biased by the availability of sequences, as the authors recognize.

      Despite the reduced sample size, the detailed analysis of haplotypes and identification of location of breakpoints gives support to a single origin event for the 13-5++ parasites.

      The analysis of haplotype variation across the duplicated chromosome-11 segment identified breakpoints at varied locations that support multiple translocation events in natural populations. The authors suggest these translocations may be occurring at high frequency in meiosis in natural populations but strongly selected against in most circumstances, which remains to be tested.

      In this new version, the authors have addressed the points raised previously and adequately discuss the limitations of the study.

    1. eLife assessment

      This important work by Malita et al. describes a mechanism by which an intestinal infection causes an increase in daytime sleep through signaling from the gut to the blood-brain barrier. Their findings suggest that cytokines upd3 and upd2 produced by the intestine following infection act on the glia of the blood-brain barrier to regulate sleep by modulating Allatostatin A signaling. The evidence supporting the claims of the authors is solid. Further verification of certain critical tools, and addressing a few discrepancies from data previously published, would improve this work.

    2. Joint Public Review

      Summary:

      The authors sought to elucidate the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood-brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation.<br /> The authors suggest that upd3 is more critical than upd2, which is not sufficiently addressed or explained. In addition, the study uses the gut's response to reactive oxygen molecules as a proxy for infection, which is not sufficiently justified. Finally, further verification of some fundamental tools used in this paper would further solidify these findings making them more convincing.

      Strengths:

      (1) The work addresses an important topic and proposes an intriguing mechanism that involves several interconnected tissues. The authors place their research in the appropriate context and reference related work, such as literature about sickness-induced sleep, ROS, the effect of nutritional deprivation on sleep, sleep deprivation and sleep rebound, upregulated receptor expression as a compensatory mechanism in response to low levels of a ligand, and information about Alst A.

      (2) The work is, in general, supported by well-performed experiments that use a variety of different tools, including multiple RNAi lines, CRISPR, and mutants, to dissect both signal-sending and receiving sides of the signaling pathway.

      (3) The authors provide compelling evidence that shows that endocrine cells from the gut are the source of the upd cytokines that increase daytime sleep, that the glial cells of the BBB are the targets of these upds, and that upd action causes the downregulation of Alst receptors in the BBB via the Jak/Stat pathways.

      Weaknesses:

      (1) There is a limited characterization of cell types in the midgut which are classically associated with upd cytokine production.

      (2) Some of the main tools used in this manuscript to manipulate the gut while not influencing the brain (e.g., Voilà and Voilà + R57C10-GAL80), are not directly shown to not affect gene expression in the brain. This is critical for a manuscript delving into intra-organ communication, as even limited expression in the brain may lead to wrong conclusions.

      (3) The model of gut inflammation used by the authors is based on the increase in reactive oxygen species (ROS) obtained by feeding flies food containing 1% H2O2. The use of this model is supported by the authors rather weakly in two papers (refs. 26 and 27 ): The paper by Jiang et al. (ref. 26) shows that the infection by Pseudomonas entomophila induces cytokine responses upd2 and 3, which are also induced by the Jnk pathway. In addition, no mention of ROS could be found in Buchon et al. (ref 27); this is a review that refers to results showing that ROS are produced by the NADPH oxidase DUOX as part of the immune response to pathogens in the gut. Thus, there is no strong support for the use of this model.

      (4) Likewise, there is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. Furthermore, it is known that ROS damages the gut epithelium, which in turn induces the expression of the cytokines studied. Thus the effects observed may not reflect the response to infection. In addition, Majcin Dorcikova et al. (2023). Circadian clock disruption promotes the degeneration of dopaminergic neurons in male Drosophila. Nat Commun. 2023 14(1):5908. doi: 10.1038/s41467-023-41540-y report that the feeding of adult flies with H2O2 results in neurodegeneration if associated with circadian clock defects. Thus, it would be important to discuss or present controls that show that the feeding of H2O2 does not cause neuronal damage.

      (5) The novelty of the work is difficult to evaluate because of the numerous publications on sleep in Drosophila. Thus, it would be very helpful to read from the authors how this work is different and novel from other closely related works such as: Li et al. (2023) Gut AstA mediates sleep deprivation-induced energy wasting in Drosophila. Cell Discov. 23;9(1):49. doi: 10.1038/s41421-023-00541-3.

    1. eLife assessment

      This study presents a valuable dataset regarding chromatin remodeling by the BAF complex in the context of meiotic sex chromosome inactivation. Solid data generally support the conclusions, although the partial deletion of the BAF complex in the germline could be considered limiting. This work will be of interest to researchers working on chromatin and reproductive biology.

    2. Reviewer #3 (Public review):

      In this manuscript, Magnuson and colleagues investigate the meiotic functions of ARID1A, a putative DNA binding subunit of the SWI/SNF chromatin remodeler BAF. The authors develop a germ cell specific conditional knockout (cKO) mouse model using Stra8-cre and observe that ARID1A-deficient cells fail to progress beyond pachytene, although due to inefficiency of the Stra8-cre system the mice retain ARID1A-expressing cells that yield sperm and allow fertility. Because ARID1A was found to accumulate at the XY body late in Prophase I, the authors suspected a potential role in meiotic silencing and by RNAseq observe significant misexpression of sex-linked genes that typically are silenced at pachytene. They go on to show that ARID1A is required for exclusion of RNA PolII from the sex body and for limiting promoter accessibility at sex-linked genes, consistent with a meiotic sex chromosome inactivation (MSCI) defect in cKO mice. The authors proceed to investigate the impacts of ARID1A on H3.3 deposition genome-wide. H3.3 is known be regulated by ARID1A and is linked to silencing, and here the authors find that upon loss of ARID1A, overall H3.3 enrichment at the sex body as measured by IF failed to occur, but H3.3 was enriched specifically at transcriptional start sites of sex-linked genes that are normally regulated by ARID1A. The results suggest that ARID1A normally prevents H3.3 accumulation at target promoters on sex chromosomes and based on additional data, restricts H3.3 to intergenic sites. Finally, the authors present data implicating ARID1A and H3.3 occupancy in DSB repair, finding that ARID1A cKO leads to a reduction in focus formation by DMC1, a key repair protein. Overall the paper provides new insights into the process of MSCI from the perspective of chromatin composition and structure and raises interesting new questions about the interplay between chromatin structure, meiotic silencing and DNA repair.

      In general the data are convincing. The conditional KO mouse model has some inherent limitations due to incomplete recombination and the existence of 'escaper' cells that express ARID1A and progress through meiosis normally. This reviewer feels that the authors have addressed this point thoroughly and have demonstrated clear and specific phenotypes using the best available animal model. The data demonstrate that the mutant cells fail to progress past pachytene, although it is unclear whether this specifically reflects pachytene arrest, as accumulation in other stages of Prophase is also suggested by the data in Table 1.

      The revised manuscript more appropriately describes the relationship between ARID1A and DNA damage response (DDR) signaling. The authors don't see defects in a few DDR markers in ARID1A CKO cells (including a low resolution assessment of ATR), suggesting that ARID1A may not be required for meiotic DDR signaling. However, as previously noted the data do not rule out the possibility that ARID1A is downstream of DDR signaling, and the authors note the possibility of a role for DDR signaling upstream of ARID1A.

      A final comment relates to the impacts of ARID1A loss on DMC1 focus formation and the interesting observation of reduced sex chromosome association by DMC1. The authors additionally assess the related recombinase RAD51 and suggest that it is unaffected by ARID1A loss. However, only a single image of RAD51 staining in the cKO is provided (Fig. S11) and there are no associated quantitative data provided. The data are suggestive and conclusions about the impacts of ARID1A loss on RAD51 must be considered as preliminary until more rigorously assessed.

      Comments on latest version:

      The authors have effectively addressed the minor issues raised in the most recent round of non-public reviews. This reviewer has no additional recommendations.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      I understand that the only spermatids observed in cKO testes are coming from cells that escaped the Cre system. However, I do think that the authors could provide sperm counts data also showing decreased sperm counts in the mutant, to make their claim stronger. This is a very common fertility assessment.

      All round spermatids isolated from Arid1acKO testes appeared only to express the normal transcript associated with the floxed allele (Fig. S4A).

      [New Data - Lines 154-159] Our evaluation of the first round of spermatid development based on DNA content (1C, 2C, and 4C), revealed a significantly reduced abundance of round spermatids (1C) in mutant testes compared to wild-type testes. This finding, obtained through flow cytometry, supports the observed meiotic block at the pachytene stage (new Fig. S5A-B).

      Reviewer 3:

      Lines 154-5: Currently read 'inefficient Stra8-cre inefficiency'. Should read 'inefficient Stra8-cre activity.' I see that this was noted in the first round of review but the original wording has persisted.

      The nucleolin antibody used should be listed in Supplementary table 3.

      'inefficient Stra8-cre inefficiency' now reads “inefficient Stra8-Cre activity”  [Line 158]

      Nucleolin antibody is now listed in Supplementary Table 3

    1. eLife assessment

      This useful study describes expression profiling by scRNA-seq of thousands of cells of recombinant yeast genotypes from a system that models natural genetic variation. The rigorous new method presented here shows promise for improving the efficiency of genotype-to-phenotype mapping in yeast, providing convincing evidence for its efficacy. This revised manuscript focuses on overcoming technical challenges with this approach and identifies several new biological insights that build upon the field of genotype-to-phenotype mapping, a central question of interest to geneticists and evolutionary biologists.

    2. Reviewer #1 (Public review):

      In the revision of their paper, N'Guessan et al have improved the report of their study of expression QTL (eQTL) mapping in yeast using single cells. The authors make use of advances in single cell RNAseq (scRNAseq) in yeast to increase the efficiency with which this type of analysis can be undertaken. Building on prior research led by the senior author that entailed genotyping and fitness profiling of almost 100,000 cells derived from a cross between two yeast strains (BY and RM) they performed scRNAseq on a subset of ~5% (n = 4,489) individual cells. To address the sparsity of genotype data in the expression profiling they used a Hidden Markov Model (HMM) to infer genotypes and then identify the most likely known lineage genotype from the original dataset. To address the relationship between variance in fitness and gene expression the authors partition the variance to investigate the sources of variation. They then perform eQTL mapping and study the relationship between eQTL and fitness QTL identified in the earlier study.

      This paper seeks to address the question of how quantitative trait variation and expression variation are related. scRNAseq represents an appealing approach to eQTL mapping as it is possible to simultaneously genotype individual cells and measure expression in the same cell. As eQTL mapping requires large sample sizes to identify statistical relationships, the use of scRNAseq is likely to dramatically increase the statistical power of such studies. However, there are several technical challenges associated with scRNAseq and the authors' study is focused on addressing those challenges. Most of the points raised by my review of the initial version have been addressed. However, one point remains and one additional point should be considered.

      (1) Given that the authors overcame many technical and analytical challenges in the course of this research, the study would be greatly strengthened through analysis of at least one, and ideally several, more conditions which would expand the conclusions that could be drawn from the study and demonstrate the power of using scRNAseq to efficiently quantify expression in different environments.

      (2) In this version the authors have introduced the use of data imputation using a published algorithm, DISCERN. This has greatly increased the variation explained by their model as presented in figure 3. However, it is possible that the explained variance is now an overestimation as a result of using the imputed expression data. I think that it would be appropriate to present figure 3 using the sparse data presented in the initial version of the paper and the newly presented imputed data so that the reader can draw their own conclusions about the interpretation.

    3. Reviewer #2 (Public review):

      The authors now say the main take-home for their work is (1) they have established methods for linkage mapping with scRNA-seq and that these (2) "can help gain insights about the genotype-phenotype map at a broader scale." My opinion in this revision is much the same as it was in the first round: I agree that they have met the first goal, and the second theme has been so well explored by other literature that I'm not convinced the authors' results meet the bar for novelty and impact. To my mind, success for this manuscript would be to support the claim that the scRNA-seq approach helps "reveal hidden components of the yeast genotype-to-phenotype map." I'm not sure the authors have achieved this. I agree that the new Figure 3 is a nice addition-a result that apparently hasn't been reported elsewhere (30% of growth trait variation can't be explained by expression). The caveats are that this is a negative result that needs to be interpreted with caution; and that it would be useful for the authors to clarify whether the ability to do this calculation is a product of the scRNA-seq method per se or whether they could have used any bulk eQTL study for it. Beside this, I regret to say that I still find that the results in the revision recapitulate what the bulk eQTL literature has already found, especially for the authors' focal yeast cross: heritability, expression hotspots, the role of cis and trans-acting variation, etc.

      Likewise, when in the first round of review I recommended that the authors repeat their analyses on previous bulk RNA-seq data from Albert et al., my point was to lead the authors to a means to provide rigorous, compelling justification for the scRNA-seq approach. The response to reviewers and the text (starting on line 413) says the comparison in its current form doesn't serve this purpose because Albert et al. studied fewer segregants. Wouldn't down-sampling the current data set allow a fair comparison? Again, to my mind what the current manuscript needs is concrete evidence that the scRNA-seq method per se affords truly better insights relative to what has come before.

      I also recommend that the authors take care to improve the main text for readability and professionalism. It would benefit from further structural revision throughout (especially in the figure captions) to allow high-impact conclusions to be highlighted and low-impact material to be eliminated. Figure 4 and the results text sections from line 319 onward could be edited for concision or perhaps moved to supplementary if they obscure the authors' case for the scRNA-seq approach. The text could also benefit from copy editing (e.g. three clauses starting with "while" in the paragraph starting on line 456; "od ratio" on line 415). I appreciate the authors' work on the discussion, including posing big picture questions for the field (lines 426-429), but I don't see how they have anything to do with the current scRNA-seq method.

    1. eLife assessment:

      This theoretical study makes a useful contribution to our understanding of a subtype of type 2 diabetes – ketosis-prone diabetes mellitus (KPD) – with a potential impact on our broader understanding of diabetes and glucose regulation. The article presents an ordinary differential equation-based model for KPD that incorporates a number of distinct timescales – fast, slow, as well as intermediate, incorporating a key hypothesis of reversible beta cell deactivation. The presented evidence is solid and shows that observed clinical disease trajectories may be explained by a simple mathematical model in a particular parameter regime.

    2. Reviewer #1 (Public review):

      The goal of this work is to understand the clinical observation of a subgroup of diabetics who experience extremely high levels of blood glucose levels after a period of high carbohydrate intake. These symptoms are similar to the onset of Type 1 diabetes but, crucially, have been observed to be fully reversible in some cases.

      The authors interpret these observations by analyzing a simple yet insightful mathematical model in which β-cells temporarily stop producing insulin when exposed to high levels of glucose. For a specific model realization of such dynamics (and for specific parameter values) they show that such dynamics lead to two distinct stable states. One is the relatively normal/healthy state in which β-cells respond appropriately to glucose by releasing insulin. In contrast, when enough β-cells "refuse" to produce insulin in a high-glucose environment, there is not enough insulin to reduce glucose levels, and the high-glucose state remains locked in because the high-glucose levels keep β-cells in their inactive state. The presented mathematical analysis shows that in their model the high-glucose state can be entered through an episode of high glucose levels and that subsequently the low-glucose state can be re-entered through prolonged insulin intake.

      The strength of this work is twofold. First, the intellectual sharpness of translating clinical observations of ketosis-prone type 2 diabetes (KPD) into the need for β-cell responses on intermediate timescales. Second, the analysis of a specific model clearly establishes that the clinical observations can be reproduced with a model in which β-cells dynamics reversibly enter a non-insulin-producing state in a glucose-dependent fashion.

      The likely impact of this work is a shift in attention in the field from a focus on the short and long-term dynamics in glucose regulation and diabetes progression to the intermediate timescales of β-cell dynamics. I expect this to lead to much interest in probing the assumptions behind the model to establish what exactly the process is by which patients enter a 'KPD state'. Furthermore, I expect this work to trigger much research on how KPD relates to "regular" type 2 diabetes and to lead to experimental efforts to find/characterize previously overlooked β-cell phenotypes.

      In summary, the authors claim that observed clinical dynamics and possible remission of KPD can be explained through introducing a temporarily inactive β-cell state into a "standard model" of diabetes. The evidence for this claim comes from analyzing a mathematical model and clearly presented. Importantly, the authors point out that this does not mean their model is correct. Other hypotheses are that:

      - Instead of switching to an inactive state, individual β-cells could adjust how they respond to high glucose levels. If this response function changes reversibly on intermediate timescales the clinical observations could be explained without a reversible inactive state.

      - Kidney function is indirectly impaired through chronic high glucose levels. The apparent rapid glucose increase might then not highlight a new type of β-cell phenotype but would reflect rapid changes in kidney function.

      - In principle, the remission could be due to a direct response of β-cells to insulin and not mediated through the lowering of glucose levels.

      Crucially, the hypothesized reversibly inactive state of β-cells remains to be directly observed. One of the key contributions of this theoretical work is directing experimental focus towards looking for reversible β-cell phenotypes.

    3. Reviewer #2 (Public review):

      In this manuscript, Ridout et al. present an intriguing extension of beta cell mass-focused models for diabetes. Their model incorporates reversible glucose-dependent inactivation of beta cell mass, which can trigger sudden-onset hyperglycemia due to bistability in beta cell mass dynamics. Notably, this hyperglycemia can be reversed with insulin treatment. The model is simple, elegant, and thought-provoking.

      Concerning the grounding in experimental phenomenology, it would be beneficial to identify specific experiments to strengthen the model. In particular, what evidence supports reversible beta cell inactivation? This could potentially be tested in mice, for instance, by using an inducible beta cell reporter, treating the animals with high glucose levels, and then measuring the phenotype of the marked cells. Such experiments, if they exist, would make the motivation for the model more compelling. For quantitative experiments, the authors should be more specific about the features of beta cell dysfunction in KPD. Does the dysfunction manifest in fasting glucose, glycemic responses, or both? Is there a "pre-KPD" condition? What is known about the disease's timescale?

      The authors should also consider whether their model could apply to other conditions besides KPD. For example, the phenomenology seems similar to the "honeymoon" phase of T1D. Making a strong case for the model in this scenario would be fascinating.

    4. Author response:

      Response to Public Comment of Reviewer 1: We thank the Reviewer for the positive assessment of the manuscript. We also are grateful to the Reviewer for pointing out that providing alternatives to our model is a strength, and not a weakness, potentially stimulating future experiments that could falsify our model.  

      Response to Public Comment of Reviewer 2: We thank the Reviewer for the positive assessment of the manuscript. 

      In our manuscript, we already provide some references to evidence supporting reversible β-cell inactivation in a high-glucose environment. In the revision, we will expand this discussion, emphasize it, and add additional references that we have discovered recently. 

      In the revision, we will additionally expand our discussion of what is and is not known about the features of β-cell dysfunction in KPD, the relevant timescales, and so on. We will expand on how little is known about the possible pre-KPD state: individuals with KPD usually show up in a hospital with a new onset of diabetes, and often have had little access to medical care prior to this presentation. Thus, prior medical records are often unavailable. We hope this theoretical work will help justify appropriate future studies of the clinical history of KPD patients. 

      In the revision of the manuscript, we plan to briefly discuss how our model might, indeed, account for the honeymoon phase of type 1 diabetes, as well as for some phenomenology of gestational diabetes, and progression of type 2 diabetes in youth. In other words, the model developed for explaining KPD is potentially much broader, explaining many other phenomena. However, we prefer to leave the detailed modeling of these conditions, and comparisons to alternate hypotheses of their pathogenesis, to a future publication.

    1. eLife assessment

      This valuable study advances our understanding of the histopathological features of type 1 diabetes, in particular in regard to the composition and spatial organization of pancreas infiltrating immune cells. The evidence supporting the conclusions is convincingly grounded in an application of both state-of-the-art high-dimensional in situ immunostaining technology as well as a tailored image analysis strategy. The work will be of broad interest to type 1 diabetes researchers as it contributes to a better understanding of the disease's etiopathology.

    2. Reviewer #1 (Public review):

      Summary:

      Barlow and coauthors utilized the high-parameter imaging platform of CODEX to characterize the cellular composition of immune cells in situ from tissues obtained from organ donors with type 1 diabetes, subjects presented with autoantibodies who are at elevated risk, or non-diabetic organ donor controls. The panels used in this important study were based on prior publications using this technology, as well as a priori and domain-specific knowledge of the field by the investigators. Thus, there was some bias in the markers selected for analysis. The authors acknowledge that these types of experiments may be complemented moving forward with the inclusion of unbiased tissue analysis platforms that are emerging that can conduct a more comprehensive analysis of pathological signatures employing emerging technologies for both high-parameter protein imaging and spatial transcriptomics.

      Strengths:

      In terms of major findings, the authors provide important confirmatory observations regarding a number of autoimmune-associated signatures reported previously. The high parameter staining now increases the resolution for linking these features with specific cellular subsets using machine learning algorithms. These signatures include a robust signature indicative of IFN-driven responses that would be expected to induce a cytotoxic T-cell-mediated immune response within the pancreas. Notable findings include the upregulation of indolamine 2,3-dioxygenase-1 in the islet microvasculature. Furthermore, the authors provide key insights as to the cell:cell interactions within organ donors, again supporting a previously reported interaction between presumably autoreactive T and B cells.

      Weaknesses:

      These studies also highlight a number of molecular pathways that will require additional validation studies to more completely understand whether they are potentially causal for pathology, or rather, epiphenomenon associated with increased innate inflammation within the pancreas of T1D subjects. Given the limitations noted above, the study does present a rich and integrated dataset for analysis of enriched immune markers that can be segmented and annotated within distinct cellular networks. This enabled the authors to analyze distinct cellular subsets and phenotypes in situ, including within islets that peri-islet infiltration and/or intra-islet insulitis.

      Despite the many technical challenges and unique organ donor cohort utilized, the data are still limited in terms of subject numbers - a challenge in a disease characterized by extensive heterogeneity in terms of age of onset and clinical and histopathological presentation. Therefore, these studies cannot adequately account for all of the potential covariates that may drive variability and alterations in the histopathologies observed (such as age of onset, background genetics, and organ donor conditions). In this study, the manuscript and figures could be improved in terms of clarifying how variable the observed signatures were across each individual donor, with the clear notion that non-diabetic donors will present with some similar challenges and variability.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to characterize the cellular phenotype and spatial relationship of cell types infiltrating the islets of Langerhans in human T1D using CODEX, a multiplexed examination of cellular markers

      Strengths:

      Major strengths of this study are the use of pancreas tissue from well-characterized tissue donors, and the use of CODEX, a state-of-the-art detection technique of extensive characterization and spatial characterization of cell types and cellular interactions. The authors have achieved their aims with the identification of the heterogeneity of the CD8+ T cell populations in insulitis, the identification of a vasculature phenotype and other markers that may mark insulitis-prone islets, and the characterization of tertiary lymphoid structures in the acinar tissue of the pancreas. These findings are very likely to have a positive impact on our understanding (conceptual advance) of the cellular factors involved in T1D pathogenesis which the field requires to make progress in therapeutics.

      Weaknesses:

      A major limitation of the study is the cohort size, which the authors directly state. However, this study provides avenues of inquiry for researchers to gain further understanding of the pathological process in human T1D.

    4. Reviewer #3 (Public review):

      Summary:

      The authors applied an innovative approach (CO-Detection by indEXing - CODEX) together with sophisticated computational analyses to image pancreas tissues from rare organ donors with type 1 diabetes. They aimed to assess key features of inflammation in both islet and extra-islet tissue areas; they reported that the extra-islet space of lobules with extensive islet infiltration differs from the extra-islet space of less infiltrated areas within the same tissue section. The study also identifies four sub-states of inflamed islets characterized by the activation profiles of CD8+T cells enriched in islets relative to the surrounding tissue. Lymphoid structures are identified in the pancreas tissue away from islets, and these were enriched in CD45RA+ T cells - a population also enriched in one of the inflamed islet sub-states. Together, these data help define the coordination between islets and the extra-islet pancreas in the pathogenesis of human T1D.

      Strengths:

      The analysis of tissue from well-characterized organ donors, provided by the Network for the Pancreatic Organ Donor with Diabetes, adds strength to the validity of the findings.

      By using their innovative imaging/computation approaches, key known features of islet autoimmunity were confirmed, providing validation of the methodology.

      The detection of IDO+ vasculature in inflamed islets - but not in normal islets or islets that have lost insulin-expression links this expression to the islet inflammation, and it is a novel observation. IDO expression in the vasculature may be induced by inflammation and may be lost as disease progresses, and it may provide a potential therapeutic avenue.

      The high-dimensional spatial phenotyping of CD8+T cells in T1D islets confirmed that most T cells were antigen-experienced. Some additional subsets were noted: a small population of T cells expressing CD45RA and CD69, possibly naive or TEMRA cells, and cells expressing Lag-3, Granzyme-B, and ICOS.

      While much attention has been devoted to the study of the insulitis lesion in T1D, our current knowledge is quite limited; the description of four sub-clusters characterized by the activation profile of the islet-infiltrating CD8+T cells is novel. Their presence in all T1D donors indicates that the disease process is asynchronous and is not at the same stage across all islets. Although this concept is not novel, this appears to be the most advanced characterization of insulitis stages.

      When examining together both the exocrine and islet areas, which is rarely done, authors report that pancreatic lobules affected by insulitis are characterized by distinct tissue markers. Their data support the concept that disease progression may require crosstalk between cells in the islet and extra-islet compartments. Lobules enriched in β-cell-depleted islets were also enriched in nerves, vasculature, and Granzyme-B+/CD3- cells, which may be natural killer cells.

      Lastly, authors report that immature tertiary lymphoid structures (TLS) exist both near and away from islets, where CD45RA+ CD8+T cells aggregate, and also observed an inflamed islet-subcluster characterized by an abundance of CD45RA+/CD8+ T cells. These TLS may represent a point of entry for T cells and this study further supports their role in islet autoimmunity.

      Weaknesses:

      As the authors themselves acknowledge, the major limitation is that the number of donors examined is limited as those satisfying study criteria are rare. Thus, it is not possible to examine disease heterogeneity and the impact of age at diagnosis. Of 8 T1D donors examined, 4 would be considered newly diagnosed (less than 3 months from onset) and 4 had longer disease durations (2, 2, 5, and 6 years). It was unclear if disease duration impacted the results in this small cohort. In the introduction, the authors discuss that most of the pancreata from nPOD donors with T1D lack insulitis. This is correct, yet it is a function of time from diagnosis. Donors with shorter duration will be more likely to have insulitis. A related point is that the proportion of islets with insulitis is low even near diagnosis, Finally, only one donor was examined that while not diagnosed with T1D, was likely in the preclinical disease stage and had autoantibodies and insulitis. This is a critically important disease stage where the methodology developed by the investigators could be applied in future efforts.

      While this was not the focus of this investigation, it appears that the approach was very much immune-focused and there could be value in examining islet cells in greater depth using the methodology the authors developed.

      Additional comments:

      Overall, the authors were able to study pancreas tissues from T1D donors and perform sophisticated imaging and computational analysis that reproduce and importantly extend our understanding of inflammation in T1D. Despite the limitations associated with the small sample size, the results appear robust, and the claims well-supported.

      The study expands the conceptual framework of inflammation and islet autoimmunity, especially by the definition of different clusters (stages) of insulitis and by the characterization of immune cells in and outside the islets.

    5. Author response:

      We’d like to thank the reviewers for their fair, thoughtful, and critical review of our manuscript.

      We acknowledge that the small number of specimens limits the impact of our findings. While we are unable to expand the study, we are optimistic that more cases with insulitis will be made available for research and spatial technologies will become more cost-effective over time. We hope that the design and analyses in our study are useful to future efforts and that our findings can be validated and revised.

      We intend to revise the manuscript to address all other points raised by reviewers. These include a) adding HLA genotype information for each patient, b) analyzing how key immune signatures relate to the clinical variables, diabetes duration and age of onset, and c) measuring the relationship between IDO+ islets and HLA-ABC expression. We will also revise the text and figures for clarity in specific places and discuss important considerations including stem cell memory T cells and the potential impact of prolonged stays in the ICU.

    1. eLife assessment

      In the revised version of this important study, the authors present a convincing pipeline for insect genome regulatory annotation across 33 insect genomes spanning 5 orders. Despite technical limitations in the field owing to the lack of comprehensive knowledge of enhancer content in any system, the authors employ several independent downstream analyses to support the validity of their enhancer predictions for a subset of these genomes. Taken together, the revised results suggest that this prediction pipeline may have uses in identifying functional enhancers across large phylogenetic distances. Reviewers note caveats that an experimental validation is not yet available in the field to validate a large class of newly identified enhancers across such evolutionary distances, and other pipelines might be of use to compare. This work will be of interest to the computational genomics, evolutionary biology, and gene regulation fields.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide an genome annotation resource of 33 insects using a motif-blind prediction methods for tissue-specific cis-regulatory modules. This is a welcome addition that may facilitate further research in new laboratory systems, and the approach seem to be relatively accurate, although it should be combined with other sources of evidence to be practical.

      Strengths:

      The paper clearly presents the resource, including the testing of candidate enhancers identified from various insects in Drosophila. This cross-species analysis, and the inherent suggestion that training datasets generated in flies can predict a cis-regulatory activity in distant insects, is interesting. While I can not be sure this approach will prevail in the future, for example with approaches that leverage the prediction of TF binding motifs, the SCRMShaw tool is certainly useful and worth of consideration for the large community of genome scientists working on insects.

      Weaknesses from the previous version were appropriately corrected in this revision, as the authors improved data availability including with genome annotation resources.

    3. Reviewer #3 (Public review):

      Summary:

      In this ambitious paper, the authors develop an unparalleled community resource of insect genome regulatory annotations spanning five insect orders. They employ their previously-developed SCRMshaw method for computational cross-species enhancer prediction, drawing on available training datasets of validated enhancer sequence and expression from Drosophila melanogaster, which had been previously shown to perform well across select holometabolous insects (representing 160-345MY divergence). In this work they expand regulatory sequence annotation to 33 insect genomes spanning Holometabola and Hemiptera, which is even more distantly related to the fly model. They perform multiple downstream analyses of sets of predicted enhancers to assess the true-positive rate of predictions; the independent comparisons of real predictions with simulated predictions and with chromatin accessibility data, as well as the functional validation through reporter gene analysis strengthen their conclusions that their annotation pipeline achieves a high true-positive rate and can be used across long divergence times to computationally annotate regulatory genome regions, an ability that has been largely inaccessible for non-model insects and now is possible across the many newly-sequenced insect scaffold-level genomes.

      Strengths:

      This work fills a large gap in current methods and resources for predicting regulatory regions of the genome, a task that has long lagged behind that of coding region prediction and analysis.

      Despite technical constraints in working outside of well-developed model insect systems, the authors creatively draw on existing resources to scaffold a pipeline and independently assess likelihood of prediction validity.

      The established database will be a welcome community resource in its current state, and even more so as the authors continue to expand their annotations to more insect genomes as they indicate. Their available analysis pipeline itself will be useful to the community as well for research groups that may want to undertake their own regulatory genome annotation.

      Weaknesses:

      The work here is limited by the field-wide lack of an independently validated set of tissue specific enhancers that could be used to directly benchmark this pipeline. The prediction of true positive enhancer identification rates and in vivo reporter gene assays offer some insight into the rates of successful prediction, but the output of SCRMshaw regulatory annotation should be regarded as another prediction-generating tool.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Strengths: 

      The paper clearly presents the resource, including the testing of candidate enhancers identified from various insects in Drosophila. This cross-species analysis, and the inherent suggestion that training datasets generated in flies can predict a cis-regulatory activity in distant insects, is interesting. While I can not be sure this approach will prevail in the future, for example with approaches that leverage the prediction of TF binding motifs, the SCRMShaw tool is certainly useful and worth consideration for the large community of genome scientists working on insects. 

      We thank the reviewer for the positive comments, and would just like to point out that we agree: while we cannot of course know if other methods will overtake SCRMshaw for enhancer prediction—we assume they will, at some point (although motif-based approaches have not fared as well in the past)—for now, SCRMshaw provides strong performance and is a useful part of the current toolkit.

      Weaknesses: 

      While the authors made the effort to provide access to the SCRMShaw annotations via the RedFly database, the usefulness of this resource is somewhat limited at the moment. First, it is possible to generate tables of annotated elements with coordinates, but it would be more useful to allow downloads of the 33 genome annotations in GFF (or equivalent) format, with SCRMshaw predictions appearing as a new feature. Also, I should note that unlike most species some annotations seem to have issues in the current RedFly implementation. For example, Vcar and Jcoen turn empty. 

      We have addressed these weaknesses in several ways:

      (1) We have created GFF versions of the SCRMshaw predictions and provide them standalone and also merged into the available annotation GFFs for each of the 33 species

      (2) We have made these GFF files, and also the original SCRMshaw output files, available for download in a Dryad repository linked to the publication (https://doi.org/10.5061/dryad.3j9kd51t0).

      (3) We have added the inadvertently omitted species to the REDfly/SCRMshaw database.

      We agree that the database functions are still somewhat limited, but note that database development is ongoing and we expect functionality to increase over time. In the meantime, the Dryad repository ensures that all results reported in this paper are directly available.

      Reviewer #2 (Public Review): 

      Summary: 

      … Upon identification of predicted enhancer regions, the authors perform post-processing step filtering and identify the most likely predicted enhancer candidates based on the proximity of an orthologous target gene. …

      We respectfully point out a small misunderstanding here on the part of the reviewer. We stress that putative target gene assignments and identities have no impact at all on our prediction of regulatory sequences, i.e., they are not “based on the proximity of an orthologous target gene.” Predictions are solely based on sequence-dependent SCRMshaw scores, with no regard to the nature or identities of nearby annotated features. Putative target genes are mapped to Drosophila orthologs purely as a convenience to aid in interpreting and prioritizing the predicted regulatory elements. We have added language on page 8 (lines 189ff) to make this more clear in the text.

      Weaknesses:

      This work provides predicted enhancer annotations across many insect species, with reporter gene analysis being conducted on selected regions to test the predictions. However, the code for the SCRMshaw analysis pipeline used in this work is not made available, making reproducibility of this work difficult. Additionally, while the authors claim the predicted enhancers are available within the REDfly database, the predicted enhancer coordinates are currently not downloadable as Supplementary Material or from a linked resource. 

      We have placed all the code for this paper into a GitHub repository “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife) to address this concern. As described in our response to Reviewer 1, above, all results are now available in multiple formats in a linked Dryad repository in addition to the REDfly/SCRMshaw database.

      The authors do not validate or benchmark the application of SCRMshaw against other published methods, nor do they seek to apply SCRMshaw under a variety of conditions to confirm the robustness of the returned predicted enhancers across species. Since SCRMshaw relies on an established k-mer enrichment of the training loci, its performance is presumably highly sensitive to the selection of training regions as well as the statistical power of the given k-mer counts. The authors do not justify their selection of training regions by which they perform predictions. 

      Our objective in this study was not to provide proof-of-principle for the SCRMshaw method, as we have established the efficacy of the approach at this point in several previous publications. Rather, the objective here was to make use of SCRMshaw to provide an annotation resource for insect regulatory genomics. Note that the training regions we used here are the same as those we have used in earlier work. Naturally, we performed various assessments to establish that the method was working here, but we make no claims in this work about SCRMshaw’s relative efficiency compared to other methods. Some of our prior publications include assessments of the sort the reviewer references, which suggest that SCRMshaw is at least comparable to other enhancer discovery approaches. We note that benchmarking of such methods is in fact extremely complicated due to the fact that there are no established true positive/true negative data sets against which to benchmark (we have explored this in Asma et al. 2019 BMC Bioinformatics).

      While there is an attempt made to report and validate the annotated predicted enhancers using previously published data and tools, the validation lacks the depth to conclude with confidence that the predicted set of regions across each species is of high quality. In vivo, reporter assays were conducted to anecdotally confirm the validity of a few selected regions experimentally, but even these results are difficult to interpret. There is no large-scale attempt to assess the conservation of enhancer function across all annotated species. 

      We respectfully disagree that there is insufficient validation. We bring several different lines of evidence to bear suggesting that our results fall into the accuracy range—roughly 75%—established both here and in previous work. We are also clear about the fact that these are predictions only and need to be viewed as such (e.g. line 638). Although “large-scale” in vivo validation assays would certainly be both interesting and worthwhile, the necessary resources for such an assessment places it beyond our present capability.

      Lastly, it is suggested that predicted regions are derived from the shared presence of sequence features such as transcription factor binding motifs, detected through k-mer enrichment via SCRMshaw. This assumption has not been examined, although there are public motif discovery tools that would be appropriate to discover whether SCRMshaw is assigning predicted regions based on previously understood motif grammar, or due to other sequence patterns captured by k-mer count distributions. Understanding the sequence-derived nature of what drives predictions is within the scope of this work and would boost confidence in the predicted enhancers, even if it is limited to a few training examples for the sake of clarity of interpretation. 

      Again, we respectfully disagree that “this assumption has not been examined.” Although we did not undertake this analysis here, we have in the past, where we have shown that known TFBS motifs can be recovered from sets of SCRMshaw predictions (e.g., Kazemian et al. 2014 Genome Biology and Evolution). We return to this point when we address the Comments to Authors, below.

      Reviewer #3 (Public Review): 

      Weaknesses:  

      The rates of predicted true positive enhancer identification vary widely across the genomes included here based on the simulations and comparison to datasets of accessible chromatin in a manner that doesn't map neatly onto phylogenetic distance. At this point, it is unclear why these patterns may arise, although this may become more clear as regulatory annotation is undertaken for more genomes. 

      We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.

      Functional assessment of predicted enhancers was performed through reporter gene assays primarily in Drosophila melanogaster imaginal discs, a system amenable to transgenics. Unfortunately, this mode of canonical imaginal disc development is only representative of a subset of all holometabolous insects; therefore, it is difficult to interpret reporter gene expression in a fly imaginal disc as evidence of a true positive enhancer that would be active in its native species whose adult appendages develop differently through the larval stage (for example, Coleopteran and Lepidopteran legs). However, the reporter gene assays from other tissues do offer strong evidence of true positive enhancer detection, and constraints on transgenic experiments in other systems mean that this approach is the best available. 

      Please see an extensive discussion of this point in our response to Reviewer 3, below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Major Concerns: 

      (1) While the GitHub source code for SCRMshaw is provided, the authors do not provide a repository of manuscriptspecific code and scripts for readers. This is a barrier to reproducibility and the code used to perform the analysis should be made available. Additionally, links to available scripts do not work, see Line 690. Post-processing scripts point to a general lab folder, but again, no specific analysis or code is sourced for the work in this specific manuscript (e.g. Line 637). 

      As noted above, we have corrected this oversight and established a specific GitHub repository for this manuscript “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife). 

      (2) On lines 479-488, there is a discussion about the annotations being provided on REDfly, though no link is provided. 

      We have included a link in the text at this point (now line 515).

      Additionally, for transparency, it would be valuable to provide in Supplementary Table 1 the genomic coordinates of the original training sets in addition to their identity. 

      These coordinates have been added to Supplementary Table 1 as suggested.

      Also, it is suggested to provide genomic coordinates of the predicted enhancers for each training set across all species, perhaps with a column denoting a linked ID of one genomic coordinate in a species to another species (i.e. if there is a linked region found from D. melanogaster to J. coenia, labeling this column in both coordinate sets as blastoderm.mapping1_region1). Providing these annotations directly in the work enhances the transparency of the results. 

      We are unsure exactly what the reviewer means here by “a linked region.” It is critical to understanding our approach to recognize that the genome sequences have diverged to the point where there is no alignment of non-coding regions possible. Thus there is no way to directly “link” coordinates of a predicted enhancer from one species to those of a predicted enhancer in another species. The coordinates for each prediction are available on a per-species basis either through the database or in the files now available in the linked Dryad repository; these can be filtered for results from a specific training set. The database will allow users to select all results for a given orthologous locus, from any subset of species. More complex searches will continue to become available as we improve functionality of the database, an ongoing project in collaboration with the REDfly team.

      (3) Figure 2B: It is unclear what this figure shows. Are the No Fly Orthologs false positives, Orthology pipeline issues, or interesting biology? 

      We have clarified this in the Figure 2 legend. “No Mapped Fly Orthologs” indicates that our orthology mapping pipeline did not identify clear D. melanogaster orthologs. For any given gene, this could reflect either a true lack of a respective ortholog, or failure of our procedure to accurately identify an existing ortholog.

      (4) SCRMshaw appears to be a versatile tool, previously published in a variety of works. However, in this manuscript, there is little discussion of the sensitivity of SCRMshaw to different initial parameters, how the selection of training loci can impact outcomes, or how SCRMshaw k-mer discovery methods compare to other similar tools.

      - This paper would be strengthened by addressing this weakness. Some specific suggestions below: 

      In order to strengthen confidence that SCRMshaw is a reliable predictor of enhancer regions in other species, it is suggested that you benchmark against other k-mer-derived methods to assign enhancers, such as GSK-SVM developed by the Beer Lab in 2016  (https://www.beerlab.org/gkmsvm/, https://www.biorxiv.org/content/10.1101/2023.10.06.561128v1). 

      We have established the effectiveness of SCRMshaw as an enhancer discovery method in previous work, and the main goal of this study was to make use of the established method to annotate numerous insect genomes as a community resource. Our claim here is that SCRMshaw works well for this purpose; we do not attempt a strong claim about whether other approaches may work equally well or marginally better (although we do not believe this is the case, based on prior work). Benchmarking enhancer discovery is challenging, as we point out in Asma et al. 2019 (BMC Bioinformatics), and, while important, best left for a dedicated comprehensive study. A major problem is that there are no independent objective “truth” sets for enhancers from the various species we interrogate here. Thus, while we could also run, e.g., GSK-SVM, what criteria would we use to establish which method had better accuracy for a given species? Note that the work from Beer’s lab took advantage of the ability to match human-mouse orthologous (or syntenic) regions and available open-chromatin data to assess whether conserved enhancers were discovered, but this is not possible given the degree of divergence, limited synteny, and relative lack of additional data for the insect genomes we are annotating.

      - In Table S1, we see that 7-146 regions are used as training sets, which is a huge variety. Does an increase in training set size provide a greater "rate of return" for predicted regions? Is the opposite true? Addressing this question would allow readers to understand if they wish to use SCRMshaw, a reasonable scope for their own training region selections. 

      - Within a training set, does subsampling provide the same outcomes in terms of prediction rates? There is no exploration of how "brittle" the training sets are, and whether the generalized k-mer count distributions that are established in a training set are consistent across randomly selected subgroups. Performing this analysis would raise confidence in the method applied and the resulting annotations. 

      These are interesting and important questions, but again we feel they are beyond the scope of this particular study, which is focused primarily on using SCRMshaw and not on optimizing various search parameters. That said, this is of course something we have investigated, although as with other aspects of enhancer discovery, the absence of a true gold standard enhancer set makes evaluation difficult. We have not found a clear correlation between training set size and performance beyond the very general finding that performance appears to be best when training set size is moderate, e.g. 20-40 initial enhancers. We suspect that larger training sets often contain too many members that don’t fit the core regulatory model and thus add noise, whereas sets that are too small may not contain enough signal for best performance (although small sets can still be useful, especially if used in an iterative cycle; see Weinstein et al. 2023 PLoS Genetics). However, establishing this rigorously is highly challenging given the limitations with assessing true and false positive rates at scale.

      (5) In Figure 2C, when plotting hexMCD, IMM, pacRC, and then the merged set, it is unclear whether the scorespecific bar allows coordinate redundancy, though this is implied. What might be more useful is a revision of this plot where the hexMCD/IMM/pac-RC-specific loci are plotted, with the merged set alongside as is currently reported. This would give the reader a clearer understanding of the variability between these scoring methods and why this variability occurs. 

      We have added the breakdowns between IMM, hexMCD, and pacRC in Supplementary Table S2, and made more complete reference to this in the text (lines 682ff). Both the database and the data files in the Dryad repository allow exploration of the overlap between the different methods and contain both separate and merged (for overlap and redundancy) results.

      Additionally, there is no information in the Methods section of these three SCRMshaw scores and what they represent, even colloquially. While SCRMshaw has been applied in several papers previously, it would help with scientific clarity to describe in a sentence or two what each score is meant to represent and why one is different from another. 

      We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.

      (6) When describing results in Figure 2, an important question arises: "Is there an anti-correlation between the number of predicted regions and evolutionary distance?" This would be an expected result that could complement Figure 4's point that shared orthology across 16 species is rarer than across 10 species. Visualizing and adding this to Figure 2 or Figure 4 would be a powerful statement that would boost confidence in the returned predicted enhancers and/or orthologous regions. 

      This is an important question and one in which we are very interested. Unfortunately, we do not have sufficient data at this time to address this proper statistical rigor. As we remarked above in response to Reviewer 3, “We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.”

      (7) In Figure 3, the authors seek to convey that SCRMshaw predicts enhancer regions that are mapped nearby one another, across different loci widths, and that this occurrence of nearby predicted regions occurs more than a randomly selected control. This is presumably meant to validate that SCRMshaw is not providing predictions with low specificity, but rather to highlight the possibility that SCRMshaw is identifying groups of shadow enhancers. However, these plots are extremely difficult to decipher and do not strongly support the claims due to the low resolution and difficult interpretability of the boxplot interquartile distributions.

      Additionally, as the majority of predicted regions are around ~750bp, how does that address loci groups of <1000bp? This suggests that predicted regions are overlapping, and therefore cannot be meaningfully interpreted as shadow enhancers. This plot should either be moved to the supplements or reworked to more effectively convey the point that "SCRMshaw is detecting predicted regions that are proximal to one another and that this proximity is not due to chance". 

      - A suggestion to rework this plot is to change this instead to a bar plot, where the y-axis instead represents "number of predictions with at least 2 predicted regions proximal to one another" divided by "total number of predictions", separating bar color by simulated/observed values. The x-axis grouping can remain the same. Because this plot is a broad generalization of the statement you're trying to make above, knowing whether a few loci have 2 versus 4 proximal predicted enhancers doesn't enhance your point. 

      We agree with the reviewer that these are not the clearest plots, and thank them for the suggestions regarding revision. We tried many variations on visualizing these complex data, including those suggested by the reviewer, and have concluded that despite their weaknesses, these plots are still the best visualization. The main problem is that the observed data cluster heavily around zero, so that the box plots are very squat and mainly only the outlier large values are observed. The key point, however, is that the expected values almost never give values much greater than one, so that the observed outlier points are the only points seen in the upper ranges of the y-axis. This is true across the three species, across the bins of locus sizes, and across training sets (averaged into the box plots). The reviewer is correct as well about the bins where locus size is < 1000. However, inspection of the data shows that this is not a large concern, as very few data points lie in this range and we never see multiple predicted enhancers there. Thus we believe while not the prettiest of graphs, Figure 3 does effectively support the claims made in the text. In keeping with our view that it is preferable to have data in the main paper whenever possible, we choose to keep the figure in place rather than move it to the Supplement.

      - Label the species for the reader's understanding of each subplot on the plot. 

      We apologize for this oversight and have now labeled each plot with its relevant species.

      (8) SCRMshaw operates on k-mer count distributions compared to a genomic background across different species, allowing it to assign predicted regions without prior knowledge of an organism's cis-regulatory sequences. This is powerful and boosts the versatility of the method. However, understanding the cis-regulatory origins of the kinds of kmers that are driving the detection of orthologous regions across species is crucial and absolutely within the scope of the paper, particularly for the justification of the provided annotations. Is SCRMshaw making use of enriched motifs within the training region set to assign regions in other species? One would presume so, but it is necessary to show this. There are many motif discovery tools that are readily available and require little up-front knowledge and little to no use of a CLI, such as MEMESuite (https://meme-suite.org/meme/tools/meme). It is highly recommended that, even for a few training pairs that are well understood (e.g. mesoderm.mapping1, dorsal_ectoderm.mapping1), assess the motif enrichment within the original sequence set, then see whether motif enrichments are reflected in the predicted enhancers. As evolutionary distance increases between D. melanogaster and the species of interest, is the assignment of enriched motifs more sparse? Is there a loss of a key motif? These are the kinds of questions that will allow readers to understand how these annotations are assigned as well as boost confidence in their usage. 

      This is a very important point and a subject of significant interest to us. We have demonstrated in earlier work (e.g., Kazemian et al. 2014 Genome Biol. Evol.) that SCRMshaw-predicted enhancers do contain expected TFBS motifs, across multiple species—and that even an overall arrangement of sites is sometimes conserved. Thus we have previously answered, in part, the reviewer’s question. 

      What we also learned from our previous work is that filtering out relevant motifs from the noise inherent in motif-finding is both arduous and challenging. As the reviewer is no doubt aware, while using motif discovery tools is simple, interpreting the output is much less so. In response to the reviewer’s comments, we revisited this issue with data from a small sample of training sets. We can discover motifs; we can see that the motif profiles are different between different training sets; and we can observe the presence of expected motifs based on the activity profile of the enhancers (e.g., Single-minded binding sites in our mesectoderm/midline training and result data). However, to do this cleanly and with appropriate statistical rigor is beyond what we feel would be practical for this paper. We hope to return to this important question in the future when we have a larger and phylogenetically more evenly-distributed set of species, and the time and resources to address it appropriately.

      (9) Figures 5-7 need to have better descriptions. 

      We have added to the figure 6 and 7 legends in response to this comment; please note as well that there is substantial detail provided in the text. If there are specific aspects of the figures that are not clear or which lack sufficient description, we are happy to make additional changes.

      Minor Concerns 

      (1)  In Figure 1A, it is implied that "k-mer count distributions" are actually only "5-mer count distributions". However, in the published documentation of SCRMshaw, it is suggested that k-mers between 1-6 bp are involved in establishing sequence distributions. Please add a justification for the selection of these criteria. It would be helpful to understand the implications of using up to a 3-mer versus a 12-mer when assessing k-mer counts using SCRMshaw.

      We have clarified in the Figure 1 legend that this is just an example, and the k-mers of different sizes are used in the IMM method; we have also increased the description of the basic method in the Methods section. To be clear, the hexMCD sub-method is 6-mer based (5th-order Markov chain), as is pacRC, while the IMM method considers Markov chains of orders 0-5.

      (2) Control the y-axis to remove white space from Figure 2D. 

      We have amended the figure as suggested.

      Additionally, expand in the manuscript on expected results from SCRMshaw. Given training regions of 750 bp, is the expectation that you return predicted enhancers of the same length? This is not explicitly stated, only a description of outliers. 

      The scoring is not dependent on the length of the training sequences, and there is no direct expectation of predicted enhancer length. Scores are calculated on 10-bp intervals, and a peak-calling algorithm is used to determine the endpoints of each prediction based on where the scores drop below a cutoff value. Thus there is no explicit minimum prediction length beyond the smallest possible length of 10-bp. That said, the initial scoring takes place over a 500-bp sequence window (for reasons of computational efficiency), which does influence scores away from the smaller end of the possible range. We correct for this in part by reducing scores below a certain threshold to zero, to prevent multiple low-scoring regions from combining to give a low but positive score over a long interval. Indeed, we found that in the original version of SCRMshawHD (Asma et al. 2019), multiple low-scoring but above-threshold intervals would get concatenated together in broad peaks, leading to an unrealistically large average prediction length. In the version used here, described in Supplementary Figure S6, low-scoring windows are now first reset to zero and a new threshold is calculated before overlapping scores are summed. This helps to prevent the broad peak problem, and we find that it results in a median prediction length ~750 bp, more in line with expected enhancer sizes.

      Reviewer #3 (Recommendations For The Authors): 

      Line 161: Given that the SCRMshaw HD method is the basis for the pipeline, the methodology deserves at least an "in brief" recapitulation in this manuscript. 

      As we remark in our response to Reviewer 2, above, “We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.” 

      Line 219: Throughout the reporting of the results, there appeared to be a bit of inconsistency/potential typos regarding whether threshold or exact P values were reported. In lines 219, 222, 265, 696, and 811, the reported values seem to clearly be thresholds (< a standard cutoff), while in lines 291,293, 297,300, values appear to be exact but are reported as thresholds (<). 

      This is not an error but rather reflects two different types of analysis. The predictions per locus (originally lines 219, 222 etc) are evaluated using an empirical P-value based on 1000 permutations. As such, they are thresholded at 1/1000. The overlap with open chromatin regions, on the other hand, are based on a z-score with the P-values taken from a standard conversion of z-scores to P-values.

      Page 13/Table 2: At face value, it seems surprising that the overlap between Dmel SCRMshaw predictions with open chromatin is so much smaller than the overlap between predictions and open chromatin in other species, both in raw % (Tcas, D plexippus, H. himera) and fold enrichment (Tcas), given that the training sets for SCRMshaw are all derived from Dmel data. The discussion here does not touch on this aspect of the results, and the interpretation of this approach, in general, would be strengthened if the authors could comment on potential reasons why this pattern may be arising here, or at least acknowledge that this is an open question.

      There are many variables at play here, as the data are from different species, from different tissues, and from different methods. Thus we think it is difficult to read too much into the precise results from these comparisons—the main take-home is really just that there is a significant amount of overlap. In acknowledgment of this, we have slightly modified the text in this section so that it now notes (line 302ff): “These comparisons are imperfect, as the tissues used to obtain the chromatin data do not precisely correspond to the training sequences used for SCRMshaw, and the data were obtained using a variety of methods.”

      Line 318-329: The inferences from the reporter gene assay deserve a more nuanced treatment than they are given here. The important nuance that was not addressed by the discussion here is that the imaginal disc mode of development in Drosophila is not broadly representative of the development of larval/adult epithelial tissues across Holometabola; thus, inference of a true positive validation becomes complicated in cases where predicted enhancers from a species were tested and shown to drive expression in a fly imaginal disc that the native species have no direct disc counterpart to. For example, in line 388 a Tcas enhancer is reported to drive expression in the eye-antennal disc, and in lines 404 and 423 additional Tcas enhancers were reported to drive expression in the leg discs; however, Tribolium larvae do not possess antennal discs or leg discs set aside during embryogenesis in the sense that flies do - instead the homologous epithelial tissues form larval antennae and larval legs external to the body wall that are actively used at this life stage and are starkly different in morphology than an internally invaginated epithelial disc, that will directly give rise to adult tissues in subsequent molts. Is the interpretation of an expression pattern driven in a fly disc as a true positive really as straightforward as it was presented here, when in the native species the expression pattern driven by the enhancer in question would be in the context of an extremely different tissue morphology? That said, I understand and am deeply sympathetic to the constraints on the authors in performing transgenic experiments outside of the model fly; but these divergent modes of development across Holometabola deserve a mention and nuance in the interpretation here. 

      This is indeed a very important point, and we greatly appreciate Reviewer 3 pointing out this caveat when interpreting the outcomes of our cross-species reporter assay. Reviewer 3 is correct that the imaginal disc mode of adult tissue (i.e. imaginal) development found in Diptera does not represent the imaginal development across Holometabola. 

      In fact, imaginal development is quite diverse among Holometabola. For instance, larval leg and antennal cells appear to directly develop into the adult legs and antennae in Coleoptera (i.e. primordial imaginal cells function as larval appendage cells), while some cells within the larval legs and antennae are set aside during larval development specifically for adult appendages in Lepidopteran species (i.e. imaginal cells exist within the larval appendages but do not contribute to the formation of larval appendages). In contrast, an almost entire set of cells that develop into adult epithelia are set aside as imaginal discs during embryogenesis in Diptera. Furthermore, the imaginal disc mode of development appears to have evolved independently in

      Hymenoptera. Therefore, determining how imaginal primordial tissues correspond to each other among Holometabola has been a challenging task and a topic of high interest within the evo-devo and entomology communities.

      Nevertheless, despite these differences in mode of imaginal development, decades of evo-devo studies suggest that the gene regulatory networks (GRNs) operating in imaginal primordial tissues appear to be fairly well conserved among holometabolan species (for example, see Tomoyasu et al. 2009 regarding wing development and Angelini et al. 2012 regarding leg development between flies and beetles). These outcomes imply that a significant portion of the transcriptional landscape might be conserved across different modes of imaginal development. Therefore, an enhancer functioning in the Tribolium larval leg tissue (which also functions as adult leg primordium) could be active even in the leg imaginal disc of Drosophila, if the trans factors essential for the activation of the enhancer are conserved between the two imaginal tissues. 

      That being said, we fully expect there to be both false negative and false positive results in our cross-species reporter assay. We are optimistic about the biological relevance of the positive outcomes of our crossspecies reporter assay, especially when the enhancer activity recapitulates the expression of the corresponding gene in Drosophila (for example, Am_ex Fig6B and Tc_hth Fig7B). Nonetheless, the biological relevance of these enhancer activities needs to be further verified in the native species through reporter assays, enhancer knock-outs, or similar experiments.

      In recognition of the Reviewer’s important point, we added the following caveat in our Discussion (lines 549553): “Furthermore, the unique imaginal disc mode of adult epithelial development in D. melanogaster  might have prevented some enhancers of other species from working properly in D. melanogaster imaginal discs, likely producing additional false negative results. Evaluating enhancer activities in the native species will allow us to address the degree of false negatives produced by the cross-species setting.” We moreover mention this caveat in the Results section when we first introduce the reporter assays (line 342).

      Line 580: This is the first time that the weakness of the closest-gene pairing approach is mentioned. This deserves mention earlier in the manuscript, as unfortunately, this is one of the major bottlenecks to this and any other approaches to investigating enhancer function. Could the authors address this earlier, perhaps pages 7-8, and provide citations for current understanding in the field of how often closest-gene pairing approaches correctly match enhancers to target genes? 

      We have added text as suggested on p.7-8 acknowledging the shortcomings of the closest-gene approach. We also clarify at the end of that section (lines 173-181) that target gene assignments, while useful for interpretation, have no bearing on the enhancer predictions themselves (which are generated prior to the target gene assignment steps).

    1. eLife assessment

      This manuscript compiles existing algorithms into an open-source software package that enables real-time (and offline) motor unit decomposition from muscle activity collected via grids of surface electrodes and indwelling electrode arrays. The package is valuable given that many motor neuroscience labs are using such algorithms and that there exists a host of potential applications for such data. Validation of the software package is compelling, suggesting that it can be successfully applied across a range of muscles and tasks.

    2. Reviewer #1 (Public review):

      Many labs world-wide now use the blind source deconvolution technique to identify the firing patterns of multiple motor units simultaneously in human subjects. This technique has had a truly transformative effective on our understanding of the structure of motor output in both normal subjects and, increasingly, in persons with neurological disorders. The key advance presented here is that the software provides real time identification of these firing patterns.

      The main strengths are the clarity of the presentation and the great potential that real-time decoding will provide. Figures are especially effective and statistical analyses are excellent.

    3. Reviewer #3 (Public review):

      In this manuscript, Rossato and colleagues present a method for real-time decoding of EMG into putative single motor units. Their manuscript details a variety of decision points in their code and data collection pipeline that lead to a final result of recording on the order of ~10 putative motor units per muscle in human males. Overall the manuscript is highly restricted in its potential utility but may be of interest to aficionados. For those outside the field of human or nonhuman primate EMG, these methods will be of limited interest.

      Comment on revised version

      The revised manuscript has thoroughly and responsively addressed the concerns and suggestions raised in the first review. I think the method will be of use to the field and fits well within the purview of eLife's publications on methods development.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment  

      This manuscript compiles existing algorithms into an open-source software package that enables realtime motor unit decomposition from muscle activity collected via grids of surface electrodes and indwelling electrode arrays. The software package is valuable given that many motor neuroscience labs are using such algorithms and that there exist a host of potential real-time applications for such data. Validation of the software package is generally solid but incomplete in some important areas: the primary data is narrow in scope and only from male participants, and there is a lack of ground truth tests on synthetic data. The impact of the software package could be strengthened by making it less tied to specific electrode hardware and by expanding it to easily permit offline analysis.

      We thank the reviewers and editors for their comments and suggestions after reading the initial version of our manuscript. In this second iteration, we have performed a validation of the algorithm using synthetic EMG signals. We have also added experimental data collected in female participants. Finally, the new version of I-Spin is compatible with the Open Ephys GUI that can interface with devices such as the Open Ephys and Intan acquisition boards. Another version has been developed for interfacing with the devices provided by the TMSi company (https://info.tmsi.com/blog/ispin-saga-real-timemotor-unit-decomposition-tool). We believe that such changes will make I-Spin more accessible for a broad range of experimental setups and research teams. Please find below the specific answers to the reviewers’ comments.

      Reviewer #1 (Public Review):  

      Many labs worldwide now use the blind source deconvolution technique to identify the firing patterns of multiple motor units simultaneously in human subjects. This technique has had a truly transformative effect on our understanding of the structure of motor output in both normal subjects and, increasingly, in persons with neurological disorders. The key advance presented here is that the software provides real-time identification of these firing patterns. The main strengths are the clarity of the presentation and the great potential that real-time decoding will provide. Figures are especially effective and statistical analyses are excellent. 

      We thank the reviewer for this positive appreciation of our work. 

      The main limitation of the work is that only male subjects were included in the validation of the software. The reason given - that yield of number of motor units identified is generally larger in males than females - is reasonable in the sense that this is the first systematic test of this real-time approach. At a minimum, however, the authors should clearly commit to future work with female subjects and emphasize the importance of considering sex differences. 

      As emphasised by the reviewer, the number of identified motor units is typically higher in males than females when using surface EMG (Taylor et al., 2022), which is the current main limitation of the implementation of offline EMG decomposition technique in a broad and representative sample of research participants. These differences between biological sex are less present when using intramuscular EMG, as the signals are less affected by the filtering effect of the volume conductor separating the motor units from the recording electrodes. Besides the different yields expected between males and females, we do not expect differences in terms of the accuracy of the motor unit identification algorithm, which is the main outcome of this paper. 

      Nevertheless, we acknowledge the importance to understand the reasons for this difference, and the imperative to refine algorithms and/or surface electrode design to mitigate this major limitation with surface EMG. 

      To support this point, the discussion has been updated (P20; L480):

      ‘An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.’

      Finally, we have completed new experiments including males and females in this new iteration (P.12; L.295):

      ‘Application of motor unit filters in experimental data

      We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC). 

      When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units. When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’

      A second weakness is that the Introduction does a poor job of establishing the potential importance of the real-time approach. 

      The introduction has been modified to highlight the importance of identifying the spiking activity of motor units in real time. Specifically, the first paragraph has been rewritten to read (P3; L67): 

      ‘The activity of motor neuron – in the form of spike trains – represents the neural code of movement to muscles. Decoding this firing activity in real-time during various behaviours can thus substantially enhance our understanding of movement control (2-5). Real-time decoding is also essential for interfacing with external devices (6) or virtual limbs (7) when activity is present at the periphery of the nervous system. For example, individuals with a spinal cord injury can control a virtual hand with the residual firing activity of the motor units in their forearm (7). Furthermore, sampling the activity of motor units receiving a substantial portion of independent synaptic inputs may pave the way for movement augmentation – specifically, extending a person’s movement repertoire through the increase of controllable degrees of freedom (8). In this way, Formento et al. (3) showed that individuals can intuitively learn to independently control motor units within the same muscle using visual cues. Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications’

      Reviewer #2 (Public Review):  

      Rossato et al present I-spin live, a software package to perform real-time blind-source separation-based sorting of motor unit activity. The core contribution of this manuscript is the development and validation of a software package to perform motor unit sorting, apply the resulting motor unit filters in real-time during muscle contractions, and provide real-time visual feedback of the motor unit activity. I have a few concerns with the work as presented: 

      I found it challenging to specifically understand the technical contributions of this manuscript. The authors do not appear to be claiming anything novel algorithmically (with respect to spike sorting) or methodologically (with respect to manual editing of spikes before the use of the algorithms in real-time). My takeaway is that the key contributions are C1) development of an open-source implementation of the Negro algorithm, C2) validating it for real-time application (evaluating its sorting efficacy, and closed-loop performance, etc), and developing a software package to run in closed-loop with visual feedback. I will comment on each of these items separately below. It would be great if the authors could more explicitly lay out the key contributions of this manuscript in the text. 

      The main objective of this work was to provide an open-source implementation of the real-time identification of motor units together with a user interface that allow researchers to easily process the data and display the firing activity of motor unit in the form of several visual feedback. We have explicitly laid out these key contributions in the introduction: “Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications.’

      Related to the above, much of the validation of the algorithms in this manuscript has a "trust me" feel. The authors note that the Negro et al. algorithm has already been validated, so very few details or presentations of primary data showing the algorithm's performance are shown. Similarly, the efficacy of the decomposition approach is evaluated using manual editing of the sorting output as a reference, which is a subjective process, and users would greatly benefit from explicit guidance. There are very few details of manual editing shown in this manuscript (I believe the authors reference the Hug et al. 2021 paper for these details), and little discussion of the core challenges and variability of that process, even though it seems to be a critical step in the proposed workflow. So this is very hard to evaluate and would be challenging for readers to replicate. 

      To address the reviewer’s comment, we added a validation step using synthetic EMG data (P.10; L.235). 

      ‘Validation of the algorithm

      We first validated the accuracy of the algorithm using synthetic EMG signals generated with an anatomical model entailing a cylindrical muscle volume with parallel fibres [see Farina et al. (29), Konstantin et al. (36) for a full description of the model)]. In this model, subcutaneous and skin layers separate the muscle from a grid of 65 surface electrodes (5 columns, 13 rows), while an intramuscular array of electrodes is directly inserted in the muscle under the grid with an angle of 30 degrees. 150 motor units were distributed within the cross section of the muscle. Recruitment thresholds, firing rate/excitatory drive relations, and twitch parameters were assigned to each motor unit using the same procedure as Fuglevand et al. (37). During each simulation, a proportional-integral-derivative controller adjusted the level of excitatory drive to minimise the error between a predefined target of force and the force generated by the active motor units. 

      Figure 3A displays the raster plots of the active motor units during simulated trapezoidal isometric contractions with plateaus of force set at 10%, 20%, and 30% MVC. A sinusoidal isometric contraction ranging between 15 and 25% MVC at a frequency of 0.5 Hz was also simulated. We identified on average 10 ± 1 and 12 ± 2 motor units with surface and intramuscular arrays, respectively (Figure 3A). During the offline decomposition, the rate of agreement between the identified discharge times and the ground truth, that is, the simulated discharge times, reached 100.0 ± 0.0% for intramuscular EMG signals and 99.2 ± 1.8% for surface EMG signals (Figure 3B). The offline estimation of motor unit filters was therefore highly accurate, independently of the level of force or the pattern of the isometric contraction.

      Motor unit filters estimated during a baseline contraction at 20% MVC were then applied in real-time on signals simulated during a contraction with a different pattern (sinusoidal; Figure 3C). The rates of agreement between the online decomposition and the ground truth reached 96.3 ± 4.6% and 98.4 ± 2.3% for surface and intramuscular EMG signals, respectively. Finally, we tested whether the accuracy of the online decomposition changed when the level of force decreased or increased by 10% MVC when compared to the calibration performed at 20% MVC (Figure 3D). The rate of agreement remained high when applying the motor unit filters on signals recorded at 10% MVC: 99.8 ± 0.2% (surface EMG) and 99.5 ± 0.3% (intramuscular EMG). It is worth noting that only 3 out of 10 motor units identified from surface EMG at 20% MVC were active at 10% MVC, while 8 out of 12 motor units identified from intramuscular EMG were active at 10 % MVC. This shows how the decomposition of EMG signals tends to identify the last recruited motor units, which often innervate a larger number of fibres than the early recruited motor units (38). On the contrary, the application of motor unit filters on signals simulated at 30% MVC led to a decrease in the rate of agreement, with values of 88.6 ± 14.0% (surface EMG) and

      80.3 ± 19.2% (intramuscular EMG). This decrease in accuracy did not impact all the motor units, with 5 motor units keeping a rate of agreement above 95% in both signals. For the other motor units, we observed a decrease in precision, which estimates the ratio of true discharge times over the total number of identified discharge times. This was caused by the recruitment of two motor units sharing a similar space within the muscle, which resulted in a merge in the same pulse train (Figure 3D).’

      In addition, we added a new paragraph in the Method section to describe the manual editing process (P.26; L.658). 

      ‘There is a consensus among experts that automatic decomposition should be followed by visual inspection and manual editing (55).  Manual editing involves the following steps: i) removing spikes that result in erroneous firing rates (outliers), ii) adding discharge times thar are clearly distinguishable from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the EMG signals (either a selected window or the entire signal), and v) repeating this procedure until no outliers are present and all clearly distinguishable spikes have been selected. Importantly, the manual editing of potentially missed or falsely identified discharge times should not be accepted before the application of the updated motor unit separation vector, thereby generating a new pulse train. Manual edits should be accepted only if the silhouette value improves following this operation or remains well above the preestablished threshold. A more extensive description of the manual editing of motor unit pulse trains can be found in (32). Even though some of the aforementioned steps involve subjective decision-making, evidence suggests that manual editing after EMG decomposition with blind source separation approaches remains highly reliable across operators (33). Specifically, the median rates of agreement calculated for 126 motor units over eight operators with various experience in manual editing was 99.6%.  All raw and processed data have been made available on a public data repository so that they can be used for training new operators (10.6084/m9.figshare.13695937).’

      I found the User Guide in the Github package to be easy to follow. Importantly, it seems heavily tied to the specific hardware (Quattrocento). I understand it may be difficult to make the full software package work with different hardware, but it seems important to at least make an offline analysis of recorded data possible for this package to be useful more broadly. 

      The software was updated to perform real-time decomposition with signals recorded from the Quattrocento and the Open Ephys GUI, which is compatible with Intan and Open Ephys acquisition boards. I-Spin has also been adapted by TMSi to perform real-time decomposition with their devices (https://info.tmsi.com/blog/ispin-saga-real-time-motor-unit-decomposition-tool). 

      Moreover, the manual editing panel of the software can now import any files from these devices and allow users to reformat data in mat files to perform offline analyses.

      While this may be a powerful platform, it is also very possible that without more details and careful guidance for users on potential pitfalls, many non-experts in sorting could use this as a platform for somewhat sloppy science. 

      We fully agree with the reviewer that real-time EMG decomposition - with a different approach here than spike sorting - may yield unreliable results if not applied properly. As outlined in the introduction of our initial manuscript, assessing the accuracy and limitations of real-time decomposition was a primary motivation for this study. Specifically, we compared accuracy between contraction intensities, muscles, and electrode types (see Results section). 

      We also demonstrated that manual editing of the decomposition outputs should be done after the training phase to improve the motor unit filters, thereby improving the accuracy of real-time decomposition. We also outlined the importance to never blindly accept the result of the decomposition without visual inspection and manual editing. (P8; L214)

      ‘These results show how manual editing can improve the accuracy of spike detection from the motor unit pulse trains. Moreover, a SIL value around 0.9 can be used as a threshold to automatically remove the motor unit pulse trains with a poor quality a priori. Thus, these two steps were performed in the all the subsequent analyses. Importantly, it is worth noting that the motor unit pulse train must always be visually inspected after the session to check for errors of the automatic identification of discharge times.’

      We have also included more detailed information about the manual editing process (see above).

      The authors mention that data is included with the Github software package. I could not find any included data, or instructions on how to run the software offline on example data. 

      This link to the data on figshare was added in the GitHub.

      Given the centrality of the real-time visual feedback to their system, the authors should show some examples of the actual display etc. so readers can understand what the system in action actually looks like (I believe there is no presentation of the actual system in the manuscript, just in the User Guide). Similarly, it would be helpful to have a schematic figure outlining the full workflow that a user goes through when using this system. 

      A figure of the workflow is present in the user manual. Additionally, we now display traces of visual feedback in figure 5 and we added videos of the software during each of the visual feedback in supplemental materials. 

      The authors note all data was collected with male subjects because more motor units can be decomposed from male subjects relative to females. But what is the long-term outlook for the field if studies avoid female subjects because their motor units may be harder to decompose? This should at least be discussed - it is an important challenge for the field to solve, and it is unacceptable if new methods just avoid this problem and are only tested on male subjects. 

      This point was rightly raised by each of the three reviewers. To solve this, we added data collected on four females, and discussed future developments to make the decomposition of surface EMG equally performant for everyone (P.20; L.480).

      ‘An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.’

      Specific comments on the core contributions of this paper:  

      C1. Development of an open-source implementation of the Negro algorithm 

      This seems an important contribution and useful for the community. There are very few figures showing any primary data, the efficacy of sorting, raw traces showing the waveforms that are identified, cluster shapes, etc. I realize the high-level algorithm has been outlined elsewhere, but the implementation in this package, and its efficacy, is a core component of the system and the claims being made in this paper. Much more presentation of data is needed to evaluate this. 

      It is worth noting that the approach used here is based on blind source separation, which is different than spike-sorting algorithms as it relies on the statistical properties of the spike trains (their sparseness) rather than the profiles of the action potentials. In short, we optimise separation vectors that are applied onto the whitened signal to generate a sparse motor unit pulse train. The discharge times are then directly estimated from the high peaks of this pulse train (Section 1 of the results; overview of the approach).

      We are thus displaying motor unit pulse trains in three figures with the automatically detected discharge times, with cases of successful separation in figure 1 and merged motor units in the same pulse train in figures 3 and 4.

      We also validated the algorithm with synthetic EMG to provide objective data on the accuracy of the algorithm. These results are shown in the section ‘Validation of the algorithm’ and displayed in figure 3.

      Similarly, more information on the offline manual editing process (e.g. showing before/after examples with primary data) would be important to gain confidence in the method. The current paper shows application to both surface EMG and intramuscular EMG, but I could not find IM EMG examples in the Hug paper (apologies if I missed them). Surface and IM data are very, very different, so one would imagine the considerations when working with them should also be different. 

      In response to another comment from the reviewer, we have included more detailed information about the manual editing process (see above). As stated above, the decomposition approach used in our software differs from a spike sorting approach. Therefore, even though intramuscular and surface EMG signals are different, the decomposition and manual editing process is the same. 

      All descriptions of math/algorithms are presented in text, without any actual math, variable definitions, etc. This presentation makes it difficult to understand what is done. I would strongly recommend writing out equations and defining variables where possible. 

      More details on how the level of sparseness is controlled during optimization would be helpful.

      And how this sparseness penalty is weighed against other optimization costs. 

      A mathematical description of the model has been added in the methods (P25; L620)

      ‘Mathematical modelling of the recorded spike trains.

      The spike train of a motor neuron recorded over time 𝑡 ∈ [0, 𝑇] can be described as the result of a convolution between a delta function (d) representing the firing times (j), and finite impulse responses (h) representing action potentials of duration L: . In practice, the nature of h and the duration L depend on the type of recordings. For electrophysiological measurements, h characterises the local electrical field generated by the spike and conducted through the surrounding tissues. 

      As the recorded volume of tissue comprises many active neurons, each recording can be considered as a convolutive mixture of multiple sources, and the previous equation can be expressed in the form of a matrix to also consider all the electrodes of an array: given , where is a matrix of m electrophysiological signals, is a matrix of n motor neurons’ spike trains, and 𝐻(𝑙) is a m by n matrix containing the lth sample of action potentials from n neurons and m signals. In this situation, we can reformulate the model as an instantaneous mixture of an extended set of sources, that is, the motor neurons’ spike trains and their delayed versions. This allows us to simply write the previous equation as a multiplication of matrices, in which each source is delayed L times, L being the duration of the impulse response h. This model can be inverted for neural decoding with source-separation approaches.’

      The rest of the decomposition approach was rewritten to make it clearer for the reader:

      ‘The monopolar EMG signals collected during the baseline contractions were extended with an extension factor of   1000/m (21), where m is the number of channels free of any noise or artifact. The signals were then demeaned and whitened. A contrast function was iteratively applied to estimate a separation vector that maximised the level of sparseness of the motor unit pulse train (Figure 1B). This loop stopped when the variation of the separation vector between two successive iterations reaches a predefined lower bound. After the application of a peak detection algorithm, the motor unit pulse train contained high peaks (i.e., the spikes from the identified motor unit) and low peaks from other motor units and noise. High peaks were separated from low peaks and noise using K-mean classification with two classes (Figure 1B). The peaks from the class with the highest centroid were considered as spikes of the identified motor unit. A second algorithm refined the estimation of the discharge times by iteratively recalculating the separation vector and repeating the steps with peak detection and K-mean classification until the coefficient of variation of the inter-spike intervals was minimised. The accuracy of each estimated spike train was assessed by computing the silhouette (SIL) value between the two classes of peaks identified with K-mean classification (24). When the SIL exceeded a predetermined threshold, the motor unit filter was saved for the real-time decomposition, together with the centroids of the ‘spikes’ and ‘noise’ classes (Figure 2A).’

      Overall the paper is not very rigorous about the accuracy of motor unit identification. For example, the authors note that SIL of 0.9 is generally used for offline evaluation (why is this acceptable?), but it was lowered to 0.8 for particular muscles in this study. But overall, it is unclear how sorting accuracy/inaccuracy affects performance in the target applications of this work. 

      In the section mentioned by the reviewer, we aimed to show how this metric can help to automatically select motor units that are likely to have a higher accuracy of spike detections as the peaks of their pulse train are easily separable from the noise. 

      We reformulated the conclusion of this section to make it clearer (P8; L214):

      ‘These results show how manual editing can improve the accuracy of spike detection from the motor unit pulse trains. Moreover, a SIL value around 0.9 can be used as a threshold to automatically remove the motor unit pulse trains with a poor quality a priori. Thus, these two steps were performed in the all the subsequent analyses. Importantly, it is worth noting that the motor unit pulse train must always be visually inspected after the session to check for errors of the automatic identification of discharge times.’

      C2. For real-time experiments, variability/jitter is important to characterize. Fig. 4 seems to be presenting mean computational times, etc, but no presentation of variability is shown. It would be helpful to depict data distributions somehow, rather than just mean values. 

      The variability in computational time was added to this section (P.28; L.730):

      ‘The standard deviation of computational times across windows reached 5.4 ± 4.0 ms (raster plot), 4.0 ± 3.2 ms (smoothed firing rate), and 2.8 ± 2.5 ms (quadrant)’

      The computational time minimally varied between the successive windows, except when the labels of the x-axis were updated in real-time with scrolling feedback. It was overall always well below the duration of the window.

      Author response image 1.

      Computational time for each iteration of the algorithm in one participant. The top panels display the continuous computation time through the recording, while the bottom panels display the distribution of computational times. The dash line represents the duration of a window of EMG signals.

      There is some description about the difference between units identified during baseline contractions, and how they might be misidentified during online contractions ("Accuracy of the real-time identification..."). This should be described in more detail. 

      We added an additional section in the results to clarify the concept of motor unit filters, and the reapplication of motor unit filters on signals in real-time. We highlighted how each motor unit must have a unique spatio-temporal signature to be accurately identified by our algorithms, in opposition to merged motor units sharing the same spatio-temporal features. This section shows how motor units accurately identified during baseline contractions can be misidentified during online contractions (P12; L295).

      ‘Application of motor unit filters in experimental data

      We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC).  

      When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units.

      When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’

      Fig. 6: Given that a key challenge in sorting should be that collisions occur during large contractions, much more primary data should be presented/visualized to show how the accuracy of sorting changes during larger contractions in online experiments. 

      As indicated above, the decomposition approach implemented in our software is not based on spikesorting, so it does not require to separate overlapping profiles of action potentials (see Methods). 

      Fig.7: In presenting the accuracy of biofeedback, it is very hard to gain any intuition for performance by just looking at RMSE values. Showing the online decoded and edited trajectories would help readers understand the magnitude of errors. 

      We updated the figure to display examples of visual feedback before and after manual editing.

      Reviewer #3 (Public Review):  

      In this manuscript, Rossato and colleagues present a method for real-time decoding of EMG into putative single motor units. Their manuscript details a variety of decision points in their code and data collection pipeline that led to a final result of recording on the order of ~10 putative motor units per muscle in human males. Overall, the manuscript is highly restricted in its potential utility but may be of interest to aficionados. For those outside the field of human or nonhuman primate EMG, these methods will be of limited interest.

      We thank the reviewer for his/her throughout evaluation of our manuscript. We recognise that this tool/resource will immediately benefit groups working with humans or nonhuman primate models. However, the recent development of intramuscular thin films with various designs adapted to rodents and smaller animals could expand the range of future users (Chung et al., 2023, Elife).  Nonetheless, decoding motor units in humans could be useful for many fields, e.g. in the domains of movement restoration and augmentation. The following paragraph has been added in the introduction section to highlight the importance of real-time decoding of motor unit activity (P3; L67):  

      ‘The activity of motor neuron – in the form of spike trains – represents the neural code of movement to muscles. Decoding this firing activity in real-time during various behaviours can thus substantially enhance our understanding of movement control (2-5). Real-time decoding is also essential for interfacing with external devices (6) or virtual limbs (7) when activity is present at the periphery of the nervous system. For example, individuals with a spinal cord injury can control a virtual hand with the residual firing activity of the motor units in their forearm (7). Furthermore, sampling the activity of motor units receiving a substantial portion of independent synaptic inputs may pave the way for movement augmentation – specifically, extending a person’s movement repertoire through the increase of controllable degrees of freedom (8). In this way, Formento et al. (3) showed that individuals can intuitively learn to independently control motor units within the same muscle using visual cues. Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications.’

      Notes 

      (1) Artificial data should be used with this method to provide ground truth performance evaluations. Without it, the study assumptions are unchallenged and could be seriously flawed.

      A new section on the validation of the algorithm has been added. We verified the accuracy of the algorithm by comparing the series of identified discharge times with the ground truth, i.e., the simulated discharge times. (P10; L235)

      ‘Validation of the algorithm

      We first validated the accuracy of the algorithm using synthetic EMG signals generated with an anatomical model entailing a cylindrical muscle volume with parallel fibres [see Farina et al. (29), Konstantin et al. (36) for a full description of the model)]. In this model, subcutaneous and skin layers separate the muscle from a grid of 65 surface electrodes (5 columns, 13 rows), while an intramuscular array of electrodes is directly inserted in the muscle under the grid with an angle of 30 degrees. 150 motor units were distributed within the cross section of the muscle. Recruitment thresholds, firing rate/excitatory drive relations, and twitch parameters were assigned to each motor unit using the same procedure as Fuglevand et al. (37). During each simulation, a proportional-integral-derivative controller adjusted the level of excitatory drive to minimise the error between a predefined target of force and the force generated by the active motor units. 

      Figure 3A displays the raster plots of the active motor units during simulated trapezoidal isometric contractions with plateaus of force set at 10%, 20%, and 30% MVC. A sinusoidal isometric contraction ranging between 15 and 25% MVC at a frequency of 0.5 Hz was also simulated. We identified on average 10 ± 1 and 12 ± 2 motor units with surface and intramuscular arrays, respectively (Figure 3A). During the offline decomposition, the rate of agreement between the identified discharge times and the ground truth, that is, the simulated discharge times, reached 100.0 ± 0.0% for intramuscular EMG signals and 99.2 ± 1.8% for surface EMG signals (Figure 3B). The offline estimation of motor unit filters was therefore highly accurate, independently of the level of force or the pattern of the isometric contraction.

      Motor unit filters estimated during a baseline contraction at 20% MVC were then applied in real-time on signals simulated during a contraction with a different pattern (sinusoidal; Figure 3C). The rates of agreement between the online decomposition and the ground truth reached 96.3 ± 4.6% and 98.4 ± 2.3% for surface and intramuscular EMG signals, respectively. Finally, we tested whether the accuracy of the online decomposition changed when the level of force decreased or increased by 10% MVC when compared to the calibration performed at 20% MVC (Figure 3D). The rate of agreement remained high when applying the motor unit filters on signals recorded at 10% MVC: 99.8 ± 0.2% (surface EMG) and 99.5 ± 0.3% (intramuscular EMG). It is worth noting that only 3 out of 10 motor units identified from surface EMG at 20% MVC were active at 10% MVC, while 8 out of 12 motor units identified from intramuscular EMG were active at 10 % MVC. This shows how the decomposition of EMG signals tends to identify the last recruited motor units, which often innervate a larger number of fibres than the early recruited motor units (38). On the contrary, the application of motor unit filters on signals simulated at 30% MVC led to a decrease in the rate of agreement, with values of 88.6 ± 14.0% (surface EMG) and 80.3 ± 19.2% (intramuscular EMG). This decrease in accuracy did not impact all the motor units, with 5 motor units keeping a rate of agreement above 95% in both signals. For the other motor units, we observed a decrease in precision, which estimates the ratio of true discharge times over the total number of identified discharge times. This was caused by the recruitment of two motor units sharing a similar space within the muscle, which resulted in a merge in the same pulse train (Figure 3D).’

      (2) From the point of view of a motor control neuroscientist studying movement in animals other than humans or non-human primates, the title was misleadingly hopeful. The use case presented in this study requires human participants to perform isometric contractions, facilitating spatially redundant recordings across the muscle for the algorithm to work. It is unclear whether these methods will be of utility to use cases under more physiological conditions (ie. dynamic movement). 

      We modified the title to read: “I-Spin live: An open-source software based on blind-source separation for real-time decoding of motor unit activity in humans”. 

      (3) The text states that "EMG signals recorded with an array of electrodes can be considered and instantaneous mixture of the original motor unit spike trains and their delayed versions." While this may be a true statement, it is not a complete statement, since motor units at distal sites may be shared, not shared, or novel. It was not clear to me whether the diversity of these scenarios would affect the performance of the software or introduce artifacts. In other words, if at site 1 you can pick up the bulk signal of units 1,2,3,4; at site two you pick up the signals of units 2,3,4,5 and site three you pick up the signal of units 3,4,5,6, what does the algorithm assume is happening and what does it report and why?

      This section has been rewritten to clarify this point. The EMG signal represents indeed the sum of the active motor units within the recorded muscle volume. Put in other words, it is possible that deep motor units or motor units with innervated fibres far away from the grid were not in this recorded muscle volume, and thus non-identifiable. Another necessary condition to ensure the identifiability of the motor unit is its unique spatio-temporal signature within the signal. It means that two motor units close to each other within the muscle volume will be merged by the model. This point was clarified in the results during the validation and the application of filters on experimental data.

      (P5; L115)

      ‘An EMG signal represents the sum of trains of action potentials from all the active motor units within the recorded muscle volume (Figure 1A). During stationary conditions, e.g., isometric contractions, the train of motor unit action potentials can be modelled as the convolution of series of discrete delta functions, representing the discharge times, and motor unit action potentials that have a consistent shape across time. When EMG signals are recorded with an array of electrodes, the shape of the recorded potential of each motor unit differs across electrodes. This is due to 1) the varying conduction velocity of action potentials among the muscle fibres, and 2) the location/depth of the muscle fibres that belong to each motor unit relatively to the electrodes, which impact the low pass filtering effect of the tissue on the recorded potential. Increasing the number and density of recording electrodes increases the likelihood that each motor unit will have a unique motor unit action potential profile (shape), i.e., a temporal and spatial profile that differs from all the other active motor unit within the recorded volume (16, 29). The uniqueness of motor unit action potential profiles is necessary for the blind source separation to accurately estimate the motor unit discharge times. Conversely, the spike trains of two motor units with similar action potential profiles will be merged by the model.

      Our software uses a fast independent component analysis (fastICA) to retrieve motor unit spike trains from the EMG signals. For this, it iteratively optimises a separation vector (i.e., the motor unit filter) for each motor unit [Figure 1B; (24-26)]. (24-26)]. The projection of the EMG signals on this separation vector generates a sparse motor unit pulse train, with most of its samples close to zero and a smaller number of samples significantly greater than zero (Figure 1B). The discharge times are estimated from this motor unit pulse train using a peak detection function and a k-mean classification with two classes to separate the high peaks (spikes) from the low peaks (noise and other motor units). During the decomposition in real-time, short segments of EMG signals are projected on the saved separation vectors, and the peaks are classified as discharge times if they are closer to the centroid of the class ‘spikes’ than to the centroid of the class ‘noise’ (Figure 1C). The algorithm used to identify motor units discharge activity is based on that proposed by Negro et al. (24) and Barsakcioglu et al. (26).’

      (4) I could not fully appreciate the performance gap solved by the current methods. What was not achievable before that is now achievable? The 125 ms speed of deconvolution? What was achievable before? Intro text around ln 85 states that 'most of the current implementations of this approach rely on offline processing, which restricts its ability to be used..." but no reference is provided here about what the non 'most' of can achieve. 

      (8) The authors might try to add text to be more circumspect about the contributions of this method. I would recommend emphasizing the conceptual advances over the specifics of the performance of the algorithm since processor speed and implementation of the ideas in a faster environment (Matlab can be slow) will change those outcomes in a trivial way. Yet, much of the results section is very focused on these metrics. 

      The main contribution of this work submitted to the section ‘Tools and Resource’ of Elife is to provide a user interface that enables researchers to decompose EMG signals recorded with multichannel systems into motor unit activities, to perform this process in real-time, and to translate it into visual feedback. The user interface is fully open source and does not require coding experience. If necessary, the users can inspect the commented code and even modify it for their own experimental setup. The toolbox is now compatible with various acquisition boards, which can expand its use to novel surface and intramuscular arrays of electrodes.

      (5) Relatedly, it would have been nice to see a proof of concept using real-time feedback for some kind of biofeedback signal. If that is the objective here, why not show us this? I found the actual readout metrics of performance rather esoteric. They may be of interest to very close experts so I will defer to them for input.

      We agree with the reviewer. Videos were added to the supplemental materials to show the different forms of feedback, together with a case scenario where the participant try to separate the activity of two motor units from the same muscle.

      (6) I was disappointed to see that only male participants are used because of some vague statement that 'it is widely known in the field' that more motor units can be resolved in males, without thorough referencing. It seems that the objective of the algorithm is the speed of analysis, not the number of units, which makes the elimination of female participants not justified. 

      The reviewer is right and that was corrected in the new version of the manuscript. We first performed additional experiments in both males and females focused on the accuracy of the approach, and further discussed the differences in yield between men and women in the discussion together with research perspectives to solve this issue.

      Results (P12; L296):

      ‘We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC).  

      When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units. When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’

      Discussion (P20; L480):

      “An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.”

      (7) Human curation is often used in spike sorting, but the description of criteria used in this step or how the human curation choices are documented is missing. 

      To address the reviewer’s comment, we added a new paragraph in the Method section to describe the manual editing process: (P26; L657)

      “There is a consensus among experts that automatic decomposition should be followed by visual inspection and manual editing (55).  Manual editing involves the following steps: i) removing spikes that result in erroneous firing rates (outliers), ii) adding discharge times thar are clearly distinguishable from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the EMG signals (either a selected window or the entire signal), and v) repeating this procedure until no outliers are present and all clearly distinguishable spikes have been selected. Importantly, the manual editing of potentially missed or falsely identified discharge times should not be accepted before the application of the updated motor unit separation vector, thereby generating a new pulse train. Manual edits should be accepted only if the silhouette value improves following this operation or remains well above the preestablished threshold. A more extensive description of the manual editing of motor unit pulse trains can be found in (32). Even though some of the aforementioned steps involve subjective decision-making, evidence suggests that manual editing after EMG decomposition with blind source separation approaches remains highly reliable across operators (33). Specifically, the median rates of agreement calculated for 126 motor units over eight operators with various experience in manual editing was 99.6%.  All raw and processed data have been made available on a public data repository so that they can be used for training new operators (10.6084/m9.figshare.13695937).”

      Minor 

      Ln 115, "inversing" is not a word. "inverse" is not a verb 

      Changed as suggested

      Ln 186, typo, bioadhesive 

      Changed as suggested

      MVC should be defined on first use. It is currently defined on 3rd use or so. 

      The term rate is used in a variety of places without units. Eg line 465 but not limited to that 

      Changed as suggested

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Two minor comments: Para 125: it is not clear what is meant by "spatial distribution" of recording electrodes. 

      ‘Density’ was used instead of ‘spatial distribution’ to now read:

      ‘Increasing the number and density of recording electrodes increases the likelihood that each motor unit will have a unique motor unit action potential profile (shape), i.e., a temporal and spatial profile that differs from all the other active motor unit within the recorded volume (16, 29).’

      Para 545: perhaps a bit more explanation about why low spatial overlap is better would be appropriate. 

      We added a section in the results showing how motor units with similar spatial signatures are merged by our model, leading to a lower precision. We therefore changed this sentence to now read:

      ‘Therefore, the likelihood of having spatially overlapping motor unit action potentials - and thus merged motor units - is lower, which explains why the rate of agreement of motor units identified from intramuscular arrays of electrodes is much higher than grids of surface electrodes (12, 13).’

      Reviewer #2 (Recommendations For The Authors): 

      The authors mention that data is included with the Github software package. I could not find any included data, or instructions on how to run the software offline on example data. (Apologies if I missed this - it would be helpful to make it more prominent)

      The link to the data on figshare was added in the GitHub, as well as data samples to run the algorithm offline and test manual editing.

      Minor comments: 

      Not sure what is meant by "boundary capabilities of online decomposition" 

      This was removed to only discuss the accuracy of online decomposition.

      CoV for ISIs is not formally defined or justified.

      This was added to the caption of figure 2:

      ‘The CoV of ISI estimates the regularity of spiking for each motor unit, an expected behaviour during isometric contractions at consistent levels of force.’

      Fig. 4: slope units should be ms/motor unit, perhaps? 

      Changed as suggested.

      In some places, the manuscript uses "edition" to describe the editing process. I am not familiar with this usage, "editing" may be more common. 

      Editing is now used through the entire manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      I would recommend that the authors revise their manuscript to conform to eLife formatting guidelines, including moving the methods to the end of the manuscript. This change may entail substantial editing since many ideas are presented in order from the beginning of the methods. While this suggestion may seem superficial, the success of the new publishing model might benefit from general uniformity in manuscript style.

      We changed and edited the draft to follow the classic format of Elife papers.

    1. eLife assessment

      This study describes a useful antibody-free method to map both G-quadruplexes and R-loops in vertebrate cells independently of the BG4 and S9.6 antibodies. It also reveals that the helicase Dhx9 can affect the self-renewal and differentiation capacities of mESCs, perhaps by regulating co-localized G4s and R-loops. The datasets provided might constitute a good starting point for future functional studies, and although the strength of the evidence that DHX9 interferes with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops has been improved compared to a previous version, it is still incomplete.

    2. Reviewer #1 (Public review):

      This study describes a useful antibody-free method to map G-quadruplexes in vertebrate cells. The analysis of the data is solid but it remains primarily descriptive and does not substantially add to existing publications (such as PMID:34792172 for example). Nevertheless, the datasets generated here might constitute a good starting point for more functional studies.

      Comments on revised version:

      It is disappointing to see that the authors decided to brush aside most of the comments made by the three referees, even though these comments were largely consistent with each other. As a result, the revised manuscript is not substantially changed or improved. Legitimate concerns regarding the specificity of the Cut&Tag signals were not addressed and therefore remain. The sensitivity of the HBD-seq signals to a combination of RNase A and RNase H does not demonstrate that HBD-seq specifically reports the presence of RNA:DNA hybrids. The new Figure 9 comparing HepG4-seq to existing datasets does not unequivocally demonstrate the superiority of the Hemin-based strategy to map G4s.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.

      The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.

      Strengths:<br /> (1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.<br /> (2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.<br /> (3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.

      Comments on revised version:

      In this revised manuscript, Liu et al. address most of the previous concerns raised by this reviewer. Namely, the comparison between the novel methods and existing ones is an important addition.

    4. Reviewer #3 (Public review):

      Summary:

      The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.

      Strengths:

      By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s) , which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.

      The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptional active genes.

      Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors found that the helicase Dhx9, a major regulator of co-localized G4s and R-loops, affects the self-renewal and differentiation capacities of mESCs.

      In conclusion, the authors provide an approach to study the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.

      Weaknesses:

      As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the author's bias against S9.6 antibodies needs also to be changed. In contrast to S9.6 CUT&Tag and other inactive ribonucleotide H1-based methods including MapR (inactive ribonucleotide H1-mediated CUT&Run) (Yan et al., 2019)and GST-2xHBD CUT&Tag (Wang et al., 2021), HBD-seq did not perform satisfactorily and its binding specificity was questionable.

      Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, hemin's affinity for different types of G4s and their differences in peroxidase activities, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.

      The authors focus on the interaction of non-B DNA structures G4s and R-loops and their roles in development and disease by regulating the transcription of related genes. Compared to the complex regulatory network of G4s and R-loops, the authors provide limited mechanistic insight into the major regulator of co-localized G4s and R-loops, helicase Dhx9. However, the authors propose that "A degron system-mediated simultaneous and/or stepwise degradation system of multiple regulators will help us elucidate the interplaying effects between G4s and R-loops." is attractive. The main innovations of this article are the proposal of new antibody-independent methods for detecting G4s and the optimization of the GST-2xHBD CUT&Tag (Wang et al., 2021) method for detecting R-loops. Unfortunately, however, the reliability and accuracy of these methods are still debatable, and the reference value of the G4s and R-loops datasets based on these methods is relatively limited.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful study describes an antibody-free method to map G-quadruplexes (G4s) in vertebrate cells. While the method might have potential, the current analysis is primarily descriptive and does not add substantial new insights beyond existing data (e.g., PMID:34792172). While the datasets provided might constitute a good starting point for future functional studies, additional data and analyses would be needed to fully support the major conclusions and, at the same time, clarify the advantage of this method over other methods. Specifically, the strength of the evidence for DHX9 interfering with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops is still incomplete.

      We thank the editors for their helpful comments.

      Given that antibody-based methods have been reported to leave open the possibility of recognizing partially folded G4s and promoting their folding, we have employed the peroxidase activity of the G4-hemin complex to develop a new method for capturing endogenous G4s that significantly reduces the risk of capturing partially folded G4s. We have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.

      In the Fig. 7, we applied the Dhx9 CUT&Tag assay to identify the G4s and R-loops directly bound by Dhx9 and further characterized the differential Dhx9-bound G4s and R-loops in the absence of Dhx9. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Furthermore, we showed that depletion of Dhx9 significantly altered the levels of G4s or R-loops around the TSS or gene bodies of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, and also their RNA levels (Fig.7 I). The above evidence is sufficient to support the transcriptional regulation of mESCs cell fate by directly modulating the G4s or R-loops within the key regulators of mESCs.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Non-B DNA structures such as G4s and R-loops have the potential to impact genome stability, gene transcription, and cell differentiation. This study investigates the distribution of G4s and R-loops in human and mouse cells using some interesting technical modifications of existing Tn5-based approaches. This work confirms that the helicase DHX9 could regulate the formation and/or stability of both structures in mouse embryonic stem cells (mESCs). It also provides evidence that the lack of DHX9 in mESCs interferes with their ability to differentiate.

      Strengths:

      HepG4-seq, the new antibody-free strategy to map G4s based on the ability of Hemin to act as a peroxidase when complexed to G4s, is interesting. This study also provides more evidence that the distribution pattern of G4s and R-loops might vary substantially from one cell type to another.

      We appreciate your valuable points.

      Weaknesses:

      This study is essentially descriptive and does not provide conclusive evidence that lack of DHX9 does interfere with the ability of mESCs to differentiate by regulating directly the stability of either G4 or R-loops. In the end, it does not substantially improve our understanding of DHX9's mode of action.

      In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. Dhx9 has been reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. We found that 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. We showed that depletion of Dhx9 significantly altered the RNA levels of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, which coincides with the significantly differential levels of G4s or R-loops around the TSS or gene bodies of these genes (Fig.7). The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.

      There is no in-depth comparison of the newly generated data with existing datasets and no rigorous control was presented to test the specificity of the hemin-G4 interaction (a lot of the hemin-dependent signal seems to occur in the cytoplasm, which is unexpected).

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity. To identify the specific signals, we have included the non-label control and used this control to call confident HepG4 peaks in all HepG4-seq assays.

      The hemin-RNA G4 complex has also been reported to have mimic peroxidase activity and trigger similar self-biotinylation signals as DNA G4s (PMID: 32329781, 31257395, 27422869). Therefore, it is not surprising to observe hemin-dependent signals in the cytoplasm generated by cytoplasmic RNA G4s.

      In the revised version, we have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.

      The authors talk about co-occurrence between G4 and R-loops but their data does not actually demonstrate co-occurrence in time. If the same loci could form alternatively either R-loops or G4 and if DHX9 was somehow involved in determining the balance between G4s and R-loops, the authors would probably obtain the same distribution pattern. To manipulate R-loop levels in vivo and test how this affects HEPG4-seq signals would have been helpful.

      Single-molecule fluorescence studies have shown the existence of a positive feedback mechanism of G4 and R-loop formation during transcription (PMID: 32810236, 32636376), suggesting that G4s and Rloops could co-localize at the same molecule. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Although depletion of Dhx9 resulted in 6,171 Dhx9-bound co-localized G4s and R-loops with significantly altered levels of G4s or R-loops, only 276 of them (~4.5%) harbored altered G4s and R-loops, suggesting that the interacting G4s and R-loops are rare in living cells. Nowadays, the genome-wide co-occurrence of two factors are mainly obtained by bioinformatically intersection analysis. We agreed that F We will carefully discuss this point in the revised version. At the same time, we will make efforts to develop a new method to map the co-localized G4 and R-loop in the same molecule in the future study.

      This study relies exclusively on Tn5-based mapping strategies. This is a problem as global changes in DNA accessibility might strongly skew the results. It is unclear at this stage whether the lack of DHX9, BLM, or WRN has an impact on DNA accessibility, which might underlie the differences that were observed. Moreover, Tn5 cleaves DNA at a nearby accessible site, which might be at an unknown distance away from the site of interest. The spatial accuracy of Tn5-based methods is therefore debatable, which is a problem when trying to demonstrate spatial co-occurrence. Alternative mapping methods would have been helpful.

      In this study, we used the recombinant streptavidin monomer and anti-GP41 nanobody fusion protein (mSA-scFv) to specifically recognize hemin-G4-induced biotinylated G4 and then recruit the recombinant GP41-tagged Tn5 protein to these G4s sites. Similarly, the recombinant V5-tagged N-terminal hybrid-binding domain (HBD) of RNase H1 specifically recognizes R-loops and recruit the recombinant protein G-Tn5 (pG-Tn5) with the help of anti-V5 antibody. Therefore, the spatial distance of Tn5 to the target sites is well controlled and very short, and also the recruitment of Tn5 is specifically determined by the existence of G4s in HepG4-seq and R-loops in HBD-seq. In addition, RNase treatment markedly abolished the HBD-seq signals and the non-labeled controls exhibit obviously reduction of HepG4-seq signals, demonstrating that HBD-seq and HepG4-seq were not contamination from tagmentation of asccessible DNA.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.

      The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.

      Strengths:

      (1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.

      (2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.

      (3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.

      We appreciate your valuable points.

      Weaknesses:

      (1) The specificity of the biotinylation process and potential off-target effects are not addressed. The authors should provide more data to validate the specificity of the G4-hemin.

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity.

      (2) Other methods exploring a catalytic dead RNAseH or the HBD to pull down R-loops have been described before. The superior quality of the presented methods in comparison to existing ones is not established. A clear comparison with other methods (BG4 CUT&Tag-seq, DRIP-seq, R-CHIP, etc) should be provided.

      Thank you for the suggestions. We have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.

      (3) Although the study demonstrates Dhx9's role in regulating co-localized G4s and R-loops, additional functional experiments (e.g., rescue experiments) are needed to confirm these findings.

      Dhx9 has been demonstrate as a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation in previous studies (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). We believe that the current new dataset and previous studies are enough to support the capability of Dhx9 in regulating co-localized G4s and R-loops.

      (4) The manuscript would benefit from a more detailed discussion of the broader implications of co-localized G4s and R-loops.

      Thank you for the suggestions. We have included the discussion in the revised version.

      (5) The manuscript lacks appropriate statistical analyses to support the major conclusions.

      We apologized for this point. Whereas we have applied careful statistical analyses in this study, lacking of some statistical details make people hard to understand some conclusions. We have carefully added details of all statistical analysis.

      (6) The discussion could be expanded to address potential limitations and alternative explanations for the results.

      Thank you for the suggestions. We have included the discussion about this point in the revised version.

      Reviewer #3 (Public Review):

      Summary:

      The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.

      Strengths:

      By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s), which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.

      The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptionally active genes.

      Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors confirm that the helicase Dhx9 is a direct and major regulator that regulates the formation and resolution of co-localized G4s and R-loops.

      Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops-associated genes.

      In conclusion, the authors provide an approach to studying the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.

      We appreciate your valuable points.

      Weaknesses:

      As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the authors' bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare it in parallel with the data they have and the data generated by the widely used S9.6 antibody.

      Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We have included the comparison between S9.6-derived data and our HBD-seq data in the Fig.9 and the section “Comparisons of HepG4-seq and HBD-seq with previous methods”.

      Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, and hemin's affinity for different types of G4s, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.

      Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s. Thus, the dynamics of G4 conformation could affect the HepG4-seq signals (PMID: 32329781). In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We have discussed this point in the revised version.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figures 1A&1G. Although no merge images were provided, it seems that the biotin signals are strongly enriched outside the nucleus. This suggests that hemin is not specific for G4s in DNA. Does it mean that Hemin can also recognise G4 on RNAs? How do the authors understand the cytoplasmic signal?

      Hemin indeed could interact with RNA G4 to obtain the peroxidase activity like DNA G4-hemin complex (PMID: 27422869, 32329781, 31257395). The cytoplasmic signals in Figure 1A&1G were derived from RNA G4.

      Figure 1A: The fact that there is no Alexa647 signal without hemin or Bio-An does not actually demonstrate that the signals are specific. These controls do not actually test for the specificity of the G4-Hemin interaction.

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In this study, we performed the IF to confirm this phenomena.

      Figure 1C: It looks like the HepG4-seq signals are simply an amplification of the noise given by the Tn5 (the non-label ctrl has the same pattern, albeit weaker). It is unclear why this happens but it might happen if somehow hemin increased the probability that the Tn5 is close to chromatin in an unspecific manner (it would cut G-rich, nucleosome-poor, accessible sites in an unspecific manner). To discard this possibility, it would be interesting to investigate directly which loci are biotinylated. For this, the authors could extract and sonicate the genomic DNA and use streptavidin to enrich for biotinylated fragments. Strand-specific DNA sequencing could then be used to map the biotinylated loci.

      In the cell culture medium, there were a certain amount of hemin from serum and a low dosage of biotin from the basal medium DMEM, which could not be avoid. Thus, these contaminated hemin and biotin would generate the background signals observed in the Non-label control samples. The biotinylated sites were specifically recognized by the recombinant Streptavidin monomer which further recruits Tn5 to the biotinylated sites with the help of Moon-tag. Different from the signals in the HEK293 samples, a much more robust HepG4-seq signals were observed in the mESC samples and the signals were also abolished in the non-label control samples. Thus, the relatively small signal-to-noise ratio in the HEK293 samples suggest the week abundance of endogenous G4s in the HEK293 cells. Thus, we politely disagree that hemin increased the non-specific recruitment of Th5. In addition, the CUT&Tag technology has been wildly demonstrated to have a much lower background, high signal-to-noise ratio and high sensitivity. Thus, we also politely disagree to replace the CUT&Tag with the traditional DNA library preparation method.

      Figure 1H: No spike-in was added and the data are not quantitative. The number of replicates is unclear. 70000 extra peaks (10x) after inhibition of BLM or WRN seems enormous. These extra peaks should be better characterised: do they contain G4 motifs? Are they transcribed? etc...; again what kind of controls should be used here, in case the inhibition of BLP and WRN has a global impact on chromatin accessibility?

      To quantitatively compare different samples, we have normalized all samples according their de-duplicated uniquely mapping reads numbers. Given that the inhibitors were dissolved in the DMSO, we used the DMSO as the control. Since the Tn5 were specifically recruited the biotinylated G4 sites through the recombinant Streptavidin monomer protein and the moon tag system, the chromatin accessibility will not affect the Tn5, which were normally observed in the ATAT-seq.

      As suggested, we have analyzed the enriched motifs of the extra peaks induced by BLM or WRN inhibition and showed that the top enriched motifs are also G-rich in the supplementary Fig.1E. In addition, we analyzed the RNA-seq levels of genes-associated with these extra peaks. As shown in the figure below, the majority of these genes are actively transcribed.

      Author response image 1.

      Figure 2: The mutated version of HBD should have been used as a control. As shown clearly in PMID: 37819055, the HBD domain does interact in an unspecific manner with chromatin at low levels. As above, this might be enough to increase the local concentration of the Tn5 close to chromatin in the Cut&Tag approach and to cleave accessible sites close to TSS in an unspecific manner.

      As shown in Fig.2B and Fig.4A, we have included the RNase treatment as the control and showed that the HBD-seq-identified R-loops signals are dramatically attenuated (Fig.2B) or almost completely abolished after the RNase treatment (Fig.4A). These data demonstrate the specificity of HBD-seq.

      Figure 2: What fraction of the HEPG4-seq signal is sensitive to RNase treatment? The authors used a combination of RNase A and RNase H but previous data have shown that the RNase A treatment is sufficient to remove the HBD-seq signal (which means that it is not actually possible on this sole basis to claim or disclaim that the signals do correspond to genuine R-loops). Do the authors have evidence that the RNase H treatment alone does impact their HBD-seq or HEPG4-seq signals?

      As shown in Fig.2B and Fig.4A, the HBD-seq-identified R-loops signals are all dramatically attenuated (Fig.2B) or almost completely abolished after the RNase treatment (Fig.4A). The specificity of HBD on recognizing R-loops has been carefully demonstrated in the previous study (PMID: 33597247). In this study, we used the same two copies of HBD (2xHBD) and replaced the GST tag to EGFP-V5 to reduce the possibility of variable high molecular-weight aggregates caused by GST tag. In addition, RNase H treatment has been shown to fail to completely abolish the CUT&Tag signals since a subset of DNA-RNA hybrids with high GC skew are partially resistant to RNase H (PMID: 32544226, 33597247). In consideration of the high GC skew of co-localized G4s and R-loops, we combined the RNase A and RNase H. We currently did not have the RNaseH alone samples.

      Figure 3A: "RNA-seq analysis revealed that the RNA levels of co-localized G4s and R-loops-associated genes are significantly higher": the differences are not very convincing.

      In the Figure 3A, we have performed the Mann-Whitney test to examine the significance in the revised manuscript. RNA levels of co-localized G4s and R-loops-associated genes are indeed significantly higher than all genes, G4s or R-loops- associated genes with the Mann-Whitney test p < 2.2E-16.

      Figure 3B: the patterns for "G4" and "co-localised G4 and R-loop" are extremely similar, suggesting that nearly all G4s mapped here could also form R-loops. If this is the case, most of the HEPG4-seq signals should be sensitive to exogenous RNase H treatment or to the in vivo over-expression of RNase H1. This should be tested (see above).

      The percentage of co-localized G4 and R-loop in G4 peaks is 80.3% ( 5,459 out of 6,799) in HEK293 cells and 72.0% (68,482 out of 95,128) in mESC cells, respectively. The co-localization does not mean that G4 and R-loop interact with each other. We have showed that only small proportion of co-localized G4s and R-loops displayed differential G4s and R-loops at the same time in the dhx9KO mESCs (Fig. 6D, Supplementary Fig. 3B), suggesting that the majority of co-localized G4s and R-loops do not interact with each other. Thus, we thought that it is not necessary to perform the RNase H test.

      Figure 3C: there is no correlation between the FC of G4 and the FC of RNA; this is not really consistent with the idea that the stabilisation of G4 is the driver rather than a consequence of the transcriptional changes.

      Given that the treatment of WRN or BLM inhibition induced a large mount of G4 accumulation (Fig.1H-I), we examined the transcription effect on genes associated with these accumulated G4s in Fig.3C. We indeed observed the effect of G4 accumulation on transcription of G4-associated genes. Given that G4 stabilization triggers the transcriptional changes, it does not mean that the transcriptional changes should be highly correlated with the increase levels of G4s. To our knowledge, we have not observed this type of connections in the previous studies. 

      l279: the overlap with H3K4me1 is really not convincing.

      For all G4 peaks, the signals of H3K4me1 indeed exhibit a high background around the center of G4 peaks but we still could observe a clear peak in the center.

      Figure 5C: it should be clearly indicated here that the authors compare Cut&Tag and ChIP data. The origin of the ChIP-seq data is also unclear and should be indicated.

      Thank you for the suggestions. We have clarified this point.

      For the ChIP data, we have described the origin of ChIP-seq data in the “Data availability” section as below: “The ChIP-seq data of histone markers and RNAP are openly available in GNomEx database (accession number 44R) (Wamstad et al., 2012).”

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1A. An experimental condition lacking H2O2 (-H2O2) should be included.

      We have added this control in Fig.1A

      (2) Does RNAse H affect G4 profiles?

      We have not tested the effect of RNase H on G4 forming. However, we have showed that only small proportion of co-localized G4s and R-loops displayed differential G4s and R-loops at the same time in the dhx9KO mESCs (Fig. 6D, Supplementary Fig. 3B), suggesting that the majority of co-localized G4s and R-loops do not interact with each other. Thus, we thought that it is not necessary to perform the RNase H test on G4. In addition, to treat cells wit RNase H, we have to permeabilize cells first to let RNase H enter the nuclei. If so, we will lose the pictures of endogenous G4s.

      (3) Figure 2G. R-loops are detected upstream of the KPNB1 gene. What is this region? Is it transcribed?

      We are so sorry to make a mistake when we prepared this figure. We have change it to the correct one in Fig. 2G. The R-loop is around the TSS of KPNB1. We also showed the RNA-seq data in this region in Author response image 2 below. This region is indeed transcribed.

      Author response image 2.

      (4) Did BLM and WRN inhibition specifically affect the expression of genes containing colocalized G4s and R-loops? Was the effect seen in other genes as well? Appropriate statistical analyses are needed.

      In the Fig.3, we have shown that the accumulation of co-localized G4 and R-loops induced by the inhibition of BLM or WRN significantly caused the changes of genes (480 in BLM inhibition, 566 in WRN inhibition) containing these structures most of which are localized at the promoter-TSS regions. We indeed detected the effect in other genes as well. There were 918 and 1020 genes with significantly changes (padjust <0.05 & FC >=2 or FC <=0.5) in BLM and WRN inhibition, respectively.

      (5) The claim that "The co-localized G4s and R-loops-mediated transcriptional regulation in HEK293 cells" (title of Figure 3) is not supported by the presented data. A causality link is not established in this study, which only reports correlations between G4s/R-loops and transcription regulation.

      We politely disagree with this point. BLM and WRN are the best characterized DNA G4-resolving helicase ((Fry and Loeb, 1999; Mendoza et al., 2016; Mohaghegh et al., 2001). Here, we used the selective small molecules to specifically inhibit their ATPase activity and observed dramatical induction of G4 accumulation. Notably, the accumulated G4s that trigger the transcriptional changes are mainly located at the promoter-TSS region. If the transcriptional changes trigger the G4 accumulations, we should not observe such a biased distribution and more accumulated G4s should be detected in the gene body.

      (6) The effect of Dhx9 KO on colocalized G4s/R-loops and transcription is not clear. The suggestion that Dhx9 could regulate transcription by modulating G4s, R-loops, and co-localized G4s and R-loops is not supported by the presented data. Additional experiments and statistical analyses are needed to conclude the role of Dhx9 on colocalized G4s/Rloops and transcription.

      Dhx9 has been extensively studied and reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Thus, it is not necessary to repeat these assays again. To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. These data have clearly shown the roles of Dhx9 directly modulating the stabilities of G4s and R-loops. Furthermore, we showed that loss of Dhx9 caused 816 Dhx9 directly bound colocalized G4 and R-loop associated genes significantly differentially expressed, supporting the transcriptional regulation of Dhx9. We performed the differential analysis following the standard pipeline: DESeq2 for RNA-seq and DiffBind for HepG4-seq and HBD-seq. The statistical details have been described in the figure legends.

      (7) The conclusion that Dhx9 regulates the self-renewal and differentiation capacities of mESCs is vague. Additional experiments are needed to elucidate the exact contribution of Dhx9.

      In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. In this study, we have shown that depletion of Dhx9 significantly attenuated the proliferation of the mESCs and also influenced the capacity of mESCs differentiation into three germline lineages during the EB assay. In addition, we showed that depletion of Dhx9 significantly reduced the protein levels of mESCs pluripotent markers Nanog and Lin28a. The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.

      Reviewer #3 (Recommendations For The Authors):

      The study on the involvement of native co-localized G4s and R-loops in transcriptional regulation further enriches the readers' understanding of genomic regulatory networks, and the functional dissection of Dhx9 also lays a good foundation for the study of the dynamic regulatory mechanisms of co-localized G4s and R-loops. Unfortunately, however, the authors lack a strong basis for questioning the widely used BG4 and S9.6 antibodies, and the co-localized G4s and R-loops sequencing data obtained by the developed and optimized method also lack parallel comparison with existing sequencing technologies, which cannot indicate that HepG4-seq and HBD-seq are more reliable and superior than BG4 and S9.6 antibody-based sequencing technologies. There are also some minor errors in the manuscript that need to be corrected.

      Thank you for the constructive comments. We have added a new section (Comparisons of HepG4-seq and HBD-seq with previous methods) and a new figure 9 to parallelly compare our methods to other widely-used methods.

      (1) This work mainly focuses on co-localized G4s and R-loops, but in the introduction section, the interplay between G4s and R-loops is only briefly mentioned. It is suggested that the importance of the interplay of G4s and R-loops for gene regulation should be further expanded to help readers better understand the significance of studying co-localized G4s and R-loops.

      Thank you for the comments. The current studies about the interplay between G4s and R-loops are limited. We have summarized all we could find in the literatures.

      (2) The authors mentioned that "a steady state equilibrium is generally set at low levels in living cells under physiological conditions (Miglietta et al., 2020) and thus the addition of high-affinity antibodies may pull the equilibrium towards folded states", in my understanding this is one of the important reasons why the authors optimized the G4s and R-loops detection assays, I wonder if there is a reliable basis for this statement. If there is, I suggest that the authors can supplement it in the manuscript.

      The main reason we develop the new method is to develop an antibody-free method to label the endogenous G4s in living cells. We ever tried to capture endogenous G4s using the tet-on controlled BG4. Unfortunately, we found that even a short time induction of BG4 in living cells was toxic. The traditional antibody-based methos rely on permeabilizing cells first to let the antibodies enter the nuclei. In this case, it is easy to lost the physiological pictures of endogenous G4s. We will add more discussion about this point. For R-loops, we just further optimized the GST-2xHBD-mediated method to avoid the problem of GST-tag. GST-fusion proteins are prone to form variable high molecular-weight aggregates and these aggregates often undermine the reliability of the fusion proteins.

      (3) Some questions about HepG4-seq:

      Is there a difference in hemin affinity for intramolecular G quadruplexes, interstrand G quadruplexes, and their different topologies? If so, does this bias affect the accuracy of sequencing results based on G4-hemin complexes?

      Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s (PMID: 32329781). Thus, the dynamics of G4 conformation possibly affect the HepG4-seq signals. In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We have discussed this point in the revised version.

      HepG4-seq is based on proximity labeling and peroxidase activity of the G4-hemin complex. The authors tested and confirmed that the addition of hemin and Bio-An in the experiment had no significant influences on sequencing results, but the effect of exogenous H2O2 treatment may also need to be taken into account since ROS can mediate the formation of G4s.

      For HepG4-seq protocol, we only treat cells with H2O2 for one minute. Thus, we thought that the side effect of H2O2 treatment should be limited in such a short time.

      (4) As we know, there have been at least two structure data of the S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the author's bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare in parallel with the data they have and the data generated by the widely used S9.6 antibody.

      Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We have included the comparison between S9.6-derived data and our HBD-seq data in the Fig.9 and the section “Comparisons of HepG4-seq and HBD-seq with previous methods”.

      (5) It is hoped that the results of immunofluorescence experiments can be statistically analyzed.

      We have performed the statistical analysis and included the data in the new figure.

      (6) Some minor errors:

      Line 168, "G4-froming" should be "G4-forming";

      Figure 5E, the color of the "Repressed" average signal at the top of the HepG4-seq heatmap should be blue;

      Figure 7C, the abbreviation "Gloop" should be indicated in the text or in the figure caption.

      Thank you for pointing out these issues. We are sorry for these mistakes. We have corrected them in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this useful study, a solid machine learning approach based on a broad set of systems to predict the R2 relaxation rates of residues in intrinsically disordered proteins (IDPs) is described. The ability to predict the patterns of R2 will be helpful to guide experimental studies of IDPs. A potential weakness is that the predicted R2 values may include both fast and slow motions, thus the predictions provide only limited new physical insights into the nature of the relevant protein dynamics.

      Fast motions are less sequence-dependent (e.g., as shown by R1). Hence the sequence-dependent part of R2 singles out slow motion.

      Public Reviews:

      Reviewer #1 (Public Review):

      Solution state 15N backbone NMR relaxation from proteins reports on the reorientational properties of the N-H bonds distributed throughout the peptide chain. This information is crucial to understanding the motions of intrinsically disordered proteins and as such has focussed the attention of many researchers over the last 20-30 years, both experimentally, analytically and using numerical simulation.

      This manuscript proposes an empirical approach to the prediction of transverse 15N relaxation rates, using a simple formula that is parameterised against a set of 45 proteins. Relaxation rates measured under a wide range of experimental conditions are combined to optimize residuespecific parameters such that they reproduce the overall shape of the relaxation profile. The purely empirical study essentially ignores NMR relaxation theory, which is unfortunate, because it is likely that more insight could have been derived if theoretical aspects had been considered at any level of detail.

      NMR relaxation theory is very valuable in particular regarding motions on different timescales. However, it has very little to say about the sequence dependence of slow motions, which is the focus of our work.

      Despite some novel aspects, in particular the diversity of the relaxation data sets, the residuespecific parameters do not provide much new insight beyond earlier work that has also noted that sidechain bulkiness correlated with the profile of R2 in disordered proteins.

      The novel insight from our work is that R2 can mostly be predicted based on the local sequence.

      Nevertheless, the manuscript provides an interesting statistical analysis of a diverse set of deposited transverse relaxation rates that could be useful to the community.

      Thank you!

      Crucially, and somewhat in contradiction to the authors stated aims in the introduction, I do not feel that the article delivers real insight into the nature of IDP dynamics. Related to this, I have difficulty understanding how an approximate prediction of the overall trend of expected transverse relaxation rates will be of further use to scientists working on IDPs. We already know where the secondary structural elements are (from 13C chemical shifts which are essential for backbone assignment) and the necessary 'scaling' of the profile to match experimental data actually contains a lot of the information that researchers seek.

      Again, the novel insight is that slow motions that dictate the sequence dependence of R2 can mostly be predicted based on the local sequence. The scaling factor may contain useful information but does not tell us anything about the sequence dependence of IDP dynamics.

      This reviewer brings up a lot of valuable points, clearly from an NMR spectroscopist’s perspective. The emphasis of our paper is somewhat different from that perspective. For example, we were interested in whether tertiary contacts make significant contributions to R2, as sometimes claimed. Our results show that, in general, they do not; instead local contacts dominate the sequence dependence of R2.

      (1) The introduction is confusing, mixing different contributions to R2 as if they emanated from the same physics, which is not necessarily true. 15N transverse relaxation is said to report on 'slower' dynamics from 10s of nanoseconds up to 1 microsecond. Semi-classical Redfield theory shows that transverse relaxation is sensitive to both adiabatic and non-adiabatic terms, due to spin state transitions induced by stochastic motions, and dephasing of coherence due to local field changes, again induced by stochastic motions. These are faster than the relaxation limit dictated by the angular correlation function. Beyond this, exchange effects can also contribute to measured R2. The extent and timescale limit of this contribution depends on the particular pulse sequence used to measure the relaxation. The differences in the pulse sequences used could be presented, and the implications of these differences for the accuracy of the predictive algorithm discussed.

      Indeed pulse sequences affect the measured R2 values. We make the modest assumption that such experimental idiosyncrasy would not corrupt the sequence dependence of IDP dynamics. As for exchange effects, our expectation is that the current SeqDYN may not do well for R2s where slow exchange plays a dominant role in generating sequence dependence, as tertiary contacts would be prominent in those cases; we now present one such case (new Fig. S5).

      (2) Previous authors have noted the correlation between observed transverse relaxation rates and amino acid sidechain bulkiness. Apart from repeating this observation and optimizing an apparently bulkiness-related parameter on the basis of R2 profiles, I am not clear what more we learn, or what can be derived from such an analysis. If one can possibly identify a motif of secondary structure because raised R2 values in a helix, for example, are missed from the prediction, surely the authors would know about the helix anyway, because they will have assigned the 13C backbone resonances, from which helical propensity can be readily calculated.

      We think that a sequence-based method that is demonstrated to predict well R2 values from expensive NMR experiments is significant. That pi-pi and cation-pi interactions are prominent features of local contacts and may seed tertiary contacts and mediate inter-chain contacts that drive phase separation is a valuable insight.

      (3) Transverse relaxation rates in IDPs are often measured to a precision of 0.1s-1 or less. This level of precision is achieved because the line-shapes of the resonances are very narrow and high resolution and sensitivity are commonly measurable. The predictions of relaxation rates, even when applying uniform scaling to optimize best-agreement, is often different to experimental measurement by 10 or 20 times the measured accuracy. There are no experimental errors in the figures. These are essential and should be shown for ease of comparison between experiment and prediction.

      Again, our focus is not the precision of the absolute R2 values, but rather the sequence dependence of R2.

      (4) The impact of structured elements on the dynamic properties of IDPs tethered to them is very well studied in the literature. Slower motions are also increased when, for example the unfolded domain binds a partner, because of the increased slow correlation time. The ad hoc 'helical boosting' proposed by the authors seems to have the opposite effect. When the helical rates are higher, the other rates are significantly reduced. I guess that this is simply a scaling problem. This highlights the limitation of scaling the rates in the secondary structural element by the same value as the rest of the protein, because the timescales of the motion are very different in these regions. In fact the scaling applied by the authors contains very important information. It is also not correct to compare the RMSD of the proposed method with MD, when MD has not applied a 'scaling'. This scaling contains all the information about relative importance of different components to the motion and their timescales, and here it is simply applied and not further analysed.

      Actually, applying the boost factor achieves the effect of a different scaling factor for the secondary structure element than for the rest of the protein.

      Regarding comparing RMSEs of SeqDYN and MD, it is true that SeqDYN applies a scaling factor whereas MD does not. However, even if we apply scaling to MD results it will not change the basic conclusion that “SeqDYN is very competitive against MD in predicting _R_2, but without the significant computational cost.”

      (5) Generally, the uniform scaling of all values by the same number is serious oversimplification. Motions are happening on all timescales they are giving rise to different transverse relaxation. It is not possible to describe IDP relaxation in terms of one single motion. Detailed studies over more than 30 years, have demonstrated that more than one component to the autocorrelation function is essential in order to account for motions on different timescales in denatured, partially disordered or intrinsically unfolded states. If one could 'scale' everything by the same number, this would imply that only one timescale of motion were important and that all others could be neglected, and this at every site in the protein. This is not expected to be the case, and in fact in the examples shown by the authors it is also never the case. There are always regions where the predicted rates are very different from experiment (with respect to experimental error), presumably because local dynamics are occurring on different timescales to the majority of the molecule. These observations contain useful information, and the observation that a single scaling works quite well probably tells us that one component of the motion is dominant, but not universally. This could be discussed.

      The reviewer appears to equate a single scaling factor with a single type of motion -- this is not correct. A single scaling factor just means that we factor out effects (e.g., temperature or magnetic field) that are uniform across the IDP sequence.

      (6) With respect to the accuracy of the prediction, discussion about molecular detail such as pi-pi interactions and phase separation propensity is possibly a little speculative.

      It is speculative; we now add more support to this speculation (p. 18 and new Fig. S6).

      (7) The authors often declare that the prediction reproduces the experimental data. The comparisons with experimental data need to be presented in terms of the chi2 per residue, using the experimentally measured precision which as mentioned, is often very high.

      Again, our interest is the sequence dependence of R2, not the absolute R2 value and its measurement precision.

      Reviewer #2 (Public Review):

      Qin, Sanbo and Zhou, Huan-Xiang created a model, SeqDYN, to predict nuclear magnetic resonance (NMR) spin relaxation spectra of intrinsically disordered proteins (IDPs), based primarily on amino acid sequence. To fit NMR data, SeqDYN uses 21 parameters, 20 that correspond to each amino acid, and a sequence correlation length for interactions. The model demonstrates that local sequence features impact the dynamics of the IDP, as SeqDYN performs better than a one residue predictor, despite having similar numbers of parameters. SeqDYN is trained using 45 IDP sequences and is retrained using both leave-one-out cross validation and five-fold cross validation, ensuring the model's robustness. While SeqDYN can provide reasonably accurate predictions in many cases, the authors note that improvements can be made by incorporating secondary structure predictions, especially for alpha-helices that exceed the correlation length of the model. The authors apply SeqDYN to study nine IDPs and a denatured ordered protein, demonstrating its predictive power. The model can be easily accessed via the website mentioned in the text.

      While the conclusions of the paper are primarily supported by the data, there are some points that could be extended or clarified.

      (1) The authors state that the model includes 21 parameters. However, they exclude a free parameter that acts as a scaling factor and is necessary to fit the experimental data (lambda). As a result, SeqDYN does not predict the spectrum from the sequence de-novo, but requires a one parameter fitting. The authors mention that this factor is necessary due to non-sequence dependent factors such as the temperature and magnetic field strength used in the experiment.

      Given these considerations, would it be possible to predict what this scaling factor should be based on such factors?

      There are still too few data to make such a prediction.

      (2) The authors mention that the Lorentzian functional form fits the data better than a Gaussian functional form, but do not present these results.

      We tested the different functional forms at the early stage of the method development. The improvement of the Lorentzian over the Gaussian was slight and we simply decided on the Lorentzian and did not go back and do a systematic analysis.

      (3) The authors mention that they conducted five-fold cross validation to determine if differences between amino acid parameters are statistically significant. While two pairs are mentioned in the text, there are 190 possible pairs, and it would be informative to more rigorously examine the differences between all such pairs.

      We now present t-test results for other pairs in new Fig. S3.

      Reviewer #3 (Public Review):

      The manuscript by Qin and Zhou presents an approach to predict dynamical properties of an intrinsically disordered protein (IDP) from sequence alone. In particular, the authors train a simple (but useful) machine learning model to predict (rescaled) NMR R2 values from sequence. Although these R2 rates only probe some aspects of IDR dynamics and the method does not provide insight into the molecular aspects of processes that lead to perturbed dynamics, the method can be useful to guide experiments.

      A strength of the work is that the authors train their model on an observable that directly relates to protein dynamics. They also analyse a relatively broad set of proteins which means that one can see actual variation in accuracy across the proteins.

      A weakness of the work is that it is not always clear what the measured R2 rates mean. In some cases, these may include both fast and slow motions (intrinsic R2 rates and exchange contributions). This in turn means that it is actually not clear what the authors are predicting. The work would also be strengthened by making the code available (in addition to the webservice), and by making it easier to compare the accuracy on the training and testing data.

      Our method predicts the sequence dependence of R2, which is dominated by slower dynamics.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Should make sure to define abbreviations such as NMR and SeqDYN.

      We now spell out NMR at first use. SeqDYN is the name of our method and is not an abbreviation.

      (2) The authors do not mention how the curves in Figure 2A are calculated.

      As we stated in the figure caption, these curves are drawn to guide the eye.

      (3) May be interesting to explore how the model parameters (q) correlate with different measures of hydrophobicity (especially those derived for IDPs like Urry). This may point to a relationship between amino acid interactions and amino acid dynamics

      We now present the correlation between q and a stickiness parameter refined by Tesei et al. (new ref 45) and used for predicting phase separation equilibrium (new Fig. S6).

      (4) The authors demonstrate that secondary structure cannot be fully accounted for by their model. They make a correction for extended alpha-helices, but the strength of this correction seems to only be based on one sequence. Would a more rigorous secondary structure correction further improve the model and perhaps allow its transferability to ordered proteins?

      We have five 4 test cases (Figs. 4E, F and 5H, I). However, we doubt that the SeqDYN method will be transferable to ordered proteins.

      Reviewer #3 (Recommendations For The Authors):

      Changes that could strengthen the manuscript substantially.

      (1) The authors do not really define what they mean by dynamics, but given that they train and benchmark on R2 measurements, the directly probe whatever goes into the measured R2. Using a direct measurement is a strength since it makes it clear what they are predicting. It also, however, makes it difficult to interpret. This is made clear in the text when the authors, for example write "𝑅2 is the one most affected by slower dynamics (10s of ns to 1 μs and beyond)." First, with the "and beyond" it could literally mean anything. Second, the "normal" R2 rate is limited up to motions up to the (local) "tumbling/reorganization" time (which is much faster), so any slow motions that go into R2 would be what one would normally call "exchange". The authors should thus make it clearer what exactly it is they are probing. In the end, this also depends on the origin of the experimental data, and whether the "R2" measurements are exchange-free or not. This may be a mixture, which hampers interpretations and which may also explain some of the rescaling that needs to be done.

      We now remove “and beyond”, and also raise the possibility that R2 measurements based on 15N relaxation may have relatively small exchange contributions (p. 17).

      (2) Related to the above, the authors might consider comparing their predictions to the relaxation experiments from Kriwacki and colleagues on a fragment of p27. In that work, the authors used dispersion experiments to probe the dynamics on different timescales. The authors would here be able to compare both to the intrinsic R2 rates (when slow motions are pulsed away) as well as the effective R2 rates (which would be the most common measurement). This would help shed light on (at least in one case) which type of R2 the prediction model captures. https://doi.org/10.1021/jacs.7b01380

      We now report this comparison in new Fig. S5 and discuss its implications (p. 17-18).

      (3) In some cases, disagreement between prediction and experiments is suggested to be due to differences in temperature, and hence is used as an argument for the rescaling done. Here, the authors use a factor of 2.0 to explain a difference between 278K and 298K, and a factor of 2.4 to explain the difference between 288K and 298K. It would be surprising if the temperature effect from 288K->298K is larger than from 278K->298K. Does this not suggest that the differences come as much from other sources?

      Note that the scaling factors 2.0 and 2.4 were obtained on two different IDPs. It is most likely that different IDPs have different scaling factors for temperature change. As a simple model, the tumbling time for a spherical particle scales with viscosity and the particle volume; correspondingly the scaling factor for temperature change should be greater for a larger particle than for a smaller particle.

      (4) The authors find (as have others before) aromatic residues to be common at/near R2 peaks. They suggest this to be indicative for Pi-Pi interactions. Could this not be other types of interactions since these residues are also "just" more hydrophobic? Also, can the authors rule out that the increased R2 rates near aromatic residues is not due to increased dynamics, but simply due to increased Rex-terms due to greater fluctuations in the chemical shifts near these residues (due to the large ring current effects).

      We noted both pi-pi and cation-pi as possible interactions that raise R2. There can be other interactions involving aromatic residues, but it’s unlikely to be only hydrophobic as Arg is also in the high-q end. For the same reason, a ring-current based explanation would be inadequate.

      (5) The authors write: "We found that, by filtering PsiPred (http://bioinf.cs.ucl.ac.uk/psipred) (35) helix propensity scores (𝑝,-.) with a very high cutoff of 0.99, the surviving helix predictions usually correspond well with residues identified by NMR as having high helix propensities." It would be good to show the evidence for this in the paper, and quantify this statement.

      The cases of most interest are the ones with long predicted helices, of which there are only 3 in the training set. For Sev-NT and CBP-ID4, we already summarize the NMR data for helix identification in the first paragraph of Results; the third case is KRS-NT, which we elaborate in p. 14.

      (6) When analysing the nine test proteins, it would be very useful for the reader to get a number for the average accuracy on the nine proteins and a corresponding number for the training proteins. The numbers are maybe there, but hard to find/compare. This would be important so that one can understand how well the model works on the training vs testing data.

      We now present the mean RMSE comparison in p. 14.

      (7) The authors write: "The 𝑞 parameters, while introduced here to characterize the propensities of amino acids to participate in local interactions, appear to correlate with the tendencies of amino acids to drive liquid-liquid phase separation." It would be good to show this data and quantify this.

      We now list supporting data in p. 18 and present new Fig. S6 for further support.

      (8) It is great that the authors have made a webservice available for easy access to the work. They should in my opinion also make the training code and data available, as well as the final trained model. Here it would also be useful to show the results from the use of a Gaussian that was also tested, and also state whether this model was discarded before or after examining the testing data.

      We have listed the IDP characteristics and sequences in Tables S1 and S2. We’re unsure whether we can disseminate the experimental R2 data without the permission of the original authors. As for the Gaussian function, as stated above, it was abandoned at an early state, before examining the testing data.

      Changes that would also be useful

      (1) The authors should make it clearer what they predict and what they don't. They mention transient helix formation and various contacts, but there isn't a one-to-one relationship between these structural features and R2 rates. Hence, they should make it clearer that they don't predict secondary structure and that an increased R2 rate may be indicative of many different structural/dynamical features on many different time scales.

      We clearly state that we apply a helix boost after the regular SeqDYN prediction.

      (2) The authors write "Instead, dynamics has emerged as a crucial link between sequence and function for IDPs" and cite their own work (reference 1) as reference for this statement. As far as I can see, that work does not study function of IDPs. Maybe the authors could cite additional work showing that the dynamics (time scales) affects function of IDPs beyond "just" structure? Otherwise, the functional consequences are not clear. Maybe the authors mean that R2 rates are indicative of (residual) structure, but that is not quite the same. Also, even in that case, there are likely more appropriate references.

      Ref. 1 summarized a number of scenarios where dynamics is related to function.

      (3) The authors might want to look at some of the older literature on interpreting NMR relaxation rates and consider whether some of it is worth citing.

      Fitting/understanding R2 profiles https://doi.org/10.1021/bi020381o https://doi.org/10.1007/s10858-006-9026-9

      MD simulations and comparisons to R2 rates without ad hoc reweighting (in addition to the papers from the authors themselves). https://doi.org/10.1021/ja710366c https://doi.org/10.1021/ja209931w

      The R2 data for the two unfolded proteins are very helpful! We now present the comparison of these data to SeqDYN prediction in Fig. 6C, D. The MD papers are superseded by more recent studies (e.g., refs. 1 and 14).

      There are more like these.

      (4) In the analysis of unfolded lysozyme, I assume that the authors are treating the methylated cysteines (which are used in the experiments) simply as cysteine. If that is the case, the authors should ideally mention this specifically.

      Treatment of methylated cysteines is now stated in the Fig. 6 caption.

      (5) The authors write "Pro has an excessively low ms𝑅2 [with data from only two IDPs (32, 33)], but that is due to the absence of an amide proton." It would be useful with an explanation why lacking a proton gives rise to low 15N R2 rates.

      That assertion originated from ref. 32.

      (6) When applying the model, the authors predict msR2 and then compare to experimental R2 by rescaling with a factor gamma. It would be good to make it clearer whether this parameter is always fitted to the experiments in all the comparisons. It would be useful to list the fitted gamma values for all the proteins (e.g. in Table S1).

      We already give a summary of the scaling factors (“For 39 of the 45 IDPs, Υ values fall in the range of 0.8 to 2.0 s–1”, p. 10).

      (7) p. 14 "nineth" -> "ninth"

      Corrected

    2. eLife assessment

      In this useful study, a solid machine learning approach based on a broad set of systems to predict the R2 relaxation rates of residues in intrinsically disordered proteins (IDPs) is described. The ability to predict the patterns of R2 will be helpful to guide experimental studies of IDPs. A potential weakness is that the predicted R2 values may include both fast and slow motions, thus the predictions provide only limited new physical insights into the nature of the underlying protein dynamics, such as the most relevant timescale.

    3. Reviewer #2 (Public review):

      Qin, Sanbo and Zhou, Huan-Xiang created a model, SeqDYN, to predict nuclear magnetic resonance (NMR) spin relaxation spectra of intrinsically disordered proteins (IDPs), based primarily on amino acid sequence. To fit NMR data, SeqDYN uses 21 parameters, 20 that correspond to each amino acid, and a sequence correlation length for interactions. The model demonstrates that local sequence features impact the dynamics of the IDP, as SeqDYN performs better than a one residue predictor, despite having similar numbers of parameters. SeqDYN is trained using 45 IDP sequences and is retrained using both leave-one-out cross validation and five-fold cross validation, ensuring the model's robustness. While SeqDYN can provide reasonably accurate predictions in many cases, the authors note that improvements can be made by incorporating secondary structure predictions, especially for alpha-helices that exceed the correlation length of the model. The authors apply SeqDYN to study nine IDPs and a denatured ordered protein, demonstrating its predictive power. The model can be easily accessed via the website mentioned in the text.

      The authors have adequately addressed the majority of my previous concerns. However, I still wonder if an attempt to fit the individual protein fitting parameter based on temperature and magnetic field strength would be possible. The authors would have 45 data points on which to fit such a parameter, which would only depend on two variables.

    4. Reviewer #3 (Public review):

      The revised manuscript adds some new relevant analyses. It still, however, is unclear which timescales of motions the method refers to and there is confusion about whether the model can predict "slower motions". While the authors answer some of my points, others are left unanswered. That is of course the authors' prerogative, and readers will in any case be able to read the reviewer comments. I am not sure it is productive to add further comments at this point.

      Below are my comments from the first round of review:

      The manuscript by Qin and Zhou presents an approach to predict dynamical properties of an intrinsically disordered protein (IDP) from sequence alone. In particular, the authors train a simple (but useful) machine learning model to predict (rescaled) NMR R2 values from sequence. Although these R2 rates only probe some aspects of IDR dynamics and the method does not provide insight into the molecular aspects of processes that lead to perturbed dynamics, the method can be useful to guide experiments.

      A strength of the work is that the authors train their model on an observable that directly relates to protein dynamics. They also analyse a relatively broad set of proteins which means that one can see actual variation in accuracy across the proteins.

      A weakness of the work is that it is not always clear what the measured R2 rates mean. In some cases, these may include both fast and slow motions (intrinsic R2 rates and exchange contributions). This in turn means that it is actually not clear what the authors are predicting. The work would also be strengthened by making the code available (in addition to the webservice), and by making it easier to compare the accuracy on the training and testing data.

    1. eLife assessment

      The manuscript proposes an alternative method by SDS-PAGE calibration of Halo-Myo10 signals to quantify myosin molecules in filopodia and discusses different scenarios regarding myosin 10 working models to explain intracellular diffusion and targeting to filopodia. Overall, the paper is elegantly written and the methodology is valuable in its descriptive potential as these are key numbers to know to ultimately decipher the cellular mechanism of Myo10 action as well as understand the molecular composition of a Myo10-generated filopodium. The evidence for the conclusions is compelling, but there are limitations to this study which should be kept in mind when applying this method to other systems.

    2. Joint Public Review:

      The paper sought to determine the number of myosin 10 molecules per cell and localized to filopodia, where they are known to be involved in formation, transport within, and dynamics of these important actin-based protrusions. The authors used a novel method to determine the number of molecules per cell. First, they expressed HALO tagged Myo10 in U20S cells and generated cell lysates of a certain number of cells and detected Myo10 after SDS-PAGE, with fluorescence and a stained free method. They used a purified HALO tagged standard protein to generate a standard curve which allowed for determining Myo10 concentration in cell lysates and thus an estimate of the number of Myo10 molecules per cell. They also examined the fluorescence intensity in fixed cell images to determine the average fluorescence intensity per Myo10 molecule, which allowed the number of Myo10 molecules per region of the cell to be determined. They found a relatively small fraction of Myo10 (6%) localizes to filopodia. There are hundreds of Myo10 in each filopodia, which suggests some filopodia have more Myo10 than actin binding sites. Thus, there may be crowding of Myo10 at the tips, which could impact transport, the morphology at the tips, and dynamics of the protrusions themselves. Overall, the study forms the basis for a novel technique to estimate the number of molecules per cell and their localization to actin-based structures. The implications are broad also for being able to understand the role of myosins in actin protrusions, which is important for cancer metastasis and wound healing.

      Comments on latest version (from the Reviewing Editor):

      One of the main critiques that still remains is that the results were derived from experiments with overexpressed Myo10 and therefore are hard to extrapolate to physiological conditions. Measurement were also only performed in a single cell line. The authors counter this critique with the argument that their results provide insight into a system in which Myo10 is a limiting factor for controlling filopodia formation. They demonstrate that U20S cells do not express detectable levels of Myo10 and thus introducing Myo10 expression demonstrates how triggering Myo10 expression impacts filopodia. An example is given of how melanoma cells often heavily upregulate Myo10.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript proposes an alternative method by SDS-PAGE calibration of Halo-Myo10 signals to quantify myosin molecules at specific subcellular locations, in this specific case filopodia, in epifluorescence datasets compared to the more laborious and troublesome single molecule approaches. Based on these preliminary estimates, the authors developed further their analysis and discussed different scenarios regarding myosin 10 working models to explain intracellular diffusion and targeting to filopodia. 

      Strengths: 

      I confirm my previous assessment. Overall, the paper is elegantly written and the data analysis is appropriately presented. Moreover, the novel experimental approach offers advantages to labs with limited access to high-end microscopy setups (super-resolution and/or EM in particular), and the authors proved its applicability to both fixed and live samples. 

      Weaknesses: 

      Myself and the other two reviewers pointed to the same weakness, the use of protein overexpression in U2OS. The authors claim that Myosin10 is not expressed by U2OS, based on Western blot analysis. Does this completely rule out the possibility that what they observed (the polarity of filopodia and the bulge accumulation of Myo10) could be an artefact of overexpression? I am afraid this still remains the main weakness of the paper, despite being properly acknowledged in the Limitations.

      Respectfully, our observations do not capture an “artefact” of overexpression but rather the “response” to overexpression. Our goal in this project was to overexpress Myo10 in a situation where it is the limiting reagent for generating filopodia. As Reviewer 3 notes below, overexpression shows that filopodial tips “can accommodate a surprisingly (shockingly) large number of motors.” This is exactly the point. Reviewer 2 considered our handling of this issue to be a strength of the paper. As far as whether bulges occur in endogenous Myo10 systems, please see our comments to Reviewer 3. 

      I consider all the remaining issues I expressed during the first revision solved. 

      Reviewer #2 (Public Review): 

      Summary: 

      The paper sought to determine the number of myosin 10 molecules per cell and localized to filopodia, where they are known to be involved in formation, transport within, and dynamics of these important actin-based protrusions. The authors used a novel method to determine the number of molecules per cell. First, they expressed HALO tagged Myo10 in U20S cells and generated cell lysates of a certain number of cells and detected Myo10 after SDS-PAGE, with fluorescence and a stained free method. They used a purified HALO tagged standard protein to generate a standard curve which allowed for determining Myo10 concentration in cell lysates and thus an estimate of the number of Myo10 molecules per cell. They also examined the fluorescence intensity in fixed cell images to determine the average fluorescence intensity per Myo10 molecule, which allowed the number of Myo10 molecules per region of the cell to be determined. They found a relatively small fraction of Myo10 (6%) localizes to filopodia. There are hundreds of Myo10 in each filopodia, which suggests some filopodia have more Myo10 than actin binding sites. Thus, there may be crowding of Myo10 at the tips, which could impact transport, the morphology at the tips, and dynamics of the protrusions themselves. Overall, the study forms the basis for a novel technique to estimate the number of molecules per cell and their localization to actin-based structures. The implications are broad also for being able to understand the role of myosins in actin protrusions, which is important for cancer metastasis and wound healing. 

      Strengths: 

      The paper addresses an important fundamental biological question about how many molecular motors are localized to a specific cellular compartment and how that may relate to other aspects of the compartment such as the actin cytoskeleton and the membrane. The paper demonstrates a method of estimating the number of myosin molecules per cell using the fluorescently labeled HALO tag and SDS-PAGE analysis. There are several important conclusions from this work in that it estimates the number of Myo10 molecules localized to different regions of the filopodia and the minimum number required for filopodia formation. The authors also establish a correlation between number of Myo10 molecules filopodia localized and the number of filopodia in the cell. There is only a small % of Myo10 that tip localized relative to the total amount in the cell, suggesting Myo10 have to be activated to enter the filopodia compartment. The localization of Myo10 is log-normal, which suggests a clustering of Myo10 is a feature of this motor. 

      One of the main critiques of the manuscript was that the results were derived from experiments with overexpressed Myo10 and therefore are hard to extrapolate to physiological conditions. The authors counter this critique with the argument that their results provide insight into a system in which Myo10 is a limiting factor for controlling filopodia formation. They demonstrate that U20S cells do not express detectable levels of Myo10 (supplementary Figure 1E) and thus introducing Myo10 expression demonstrates how triggering Myo10 expression impacts filopodia. An example is given how melanoma cells often heavily upregulate Myo10. 

      In addition, the revised manuscript addresses the concerns about the method to quantitate the number of Myo10 molecules per cell and therefore puncta in the cell. The authors have now made a good faith effort to correct for incomplete labeling of the HALO tag (Figure 2A-C, supplementary Figure 2D-E). The authors also address the concerns about variability in transfection efficiency (Figure 1D-E). 

      A very interesting addition to the revised manuscript was the quantitation of the number of Myo10 molecules present during an initiation event when a newly formed filopodia just starts to elongate from the plasma membrane. They conclude that 100s of Myo10 molecules are present during an initiation event. They also examined other live cell imaging events in which growth occurs from a stable filopodia tip and correlated with elongation rates. 

      Weaknesses: 

      The authors acknowledge that a limitation of the study is that all of the experiments were performed with overexpressed Myo10. They address this limitation in the discussion but also provide important comparisons for how their work relates to physiological conditions, such as melanoma cells that only express large amounts of Myo10 when they are metastatic. Also, the speculation about how fascin can outcompete Myo10 should include a mechanism for how the physiological levels of fascin can complete with the overabundance of Myo10 (page 10, lines 401-408). 

      We have expanded the discussion about fascin competing with high concentrations of Myo10 in filopodial tips on pg. 15. The key feature is that fascin binding in a bundle is essentially irreversible, so it wins if any space opens up and it manages to bind before the next Myo10 arrives.

      Reviewer #3 (Public Review): 

      Summary 

      The work represents progress in quantifying the number of Myo10 molecules present in the filopodia tip. It reveals that cells overexpressing fluorescently labeled Myo10 that the tip can accommodate a wide range of Myo10 motors, up to hundreds of molecules per tip. 

      The revised, expanded manuscript addresses all of this reviewer's original comments. The new data, analysis and writing strengthen the paper. Given the importance of filopodia in many cellular/developmental processes and the pivotal, as yet not fully understood role of Myo10 in their formation and extension, this work provides a new look at the nature of the filopodial tip and its ability to accommodate a large number of Myo10 motor proteins through interactions with the actin core and surrounding membrane. 

      Specific comments - 

      (1) One of the comments on the original work was that the analysis here is done using cells ectopically expressing HaloTag-Myo10. The author's response is that cells express a range of Myo10 levels and some metastatic cancer cells, such as breast cancer, have significantly increased levels of Myo10 compared to non-transformed cell lines. It is not really clear how much excess Myo10 is present in those cells compared to what is seen here for ectopic expression in U2OS cells, making a direct correspondence difficult.

      We agree, a direct correspondence is difficult, and is further complicated by other variables (e.g., expression levels of Myo10 activators, cargoes, fascin, or other filopodial components) that may differ among cell lines. Properly sorting this out will require additional work in a few key cellular systems.

      However, there are two points to keep in mind that somewhat mitigate this concern. First, because ectopic expression of Myo10 causes an ~30x increase in the number of filopodia, the activated Myo10 population is divided over that larger filopodial population. Second, the log-normal distribution of Myo10 across filopodia has a long tail, which means that some cells with low levels of Myo10 will concentrate that Myo10 in a few filopodia. 

      In response to comments about the bulbous nature of many filopodia tips the authors point out that similar-looking tips are seen when cells are immunostained for Myo10, citing Berg & Cheney (2002). In looking at those images as well as images from papers examining Myo10 immunostaining in metastatic cancer cells (Arjonen et al, 2014, JCI; Summerbell et al, 2020, Sci Adv) the majority of the filopodia tips appear almost uniformly dot-like or circular. There is not too much evidence of the elongated, bulbous filopodial tips seen here.

      Yes, the tips in Berg and Cheney are circular, but their size varies considerably (just as a balloon is roughly circular, its size varies with the amount of air it contains). Non-bulbous filopodial tips have a theoretical radius of ~100 nm, which is below the diffraction limit. However, many of the filopodial tips are larger than the diffraction limit in Berg and Cheney, Fig. 1a. We cropped and zoomed in the images to show each fully visible filopodial tip

      We attempted to perform a similar analysis of the images in Arjonen and Summerbell. Unfortunately, their images are too small to do so. 

      However, in reconsidering the approach and results, it is the case that the finding here do establish the plasticity of filopodia tips that can accommodate a surprisingly (shockingly) large number of motors. The authors discuss that their results show that targeting molecules to the filopodia tip is a relatively permissive process (lines 262 - 274). That could be an important property that cells might be able to use to their advantage in certain contexts. 

      (2) The method for arriving at the intensity of an individual filopodium puncta (starting on line 532 and provided in the Response), and how this is corrected for transfection efficiency and the cell-to-cell variation in expression level is still not clear to this reviewer. The first part of the description makes sense - the authors obtain total molecules/cell based on the estimation on SDS-PAGE using the signal from bound Halo ligand. It then seems that the total fluorescence intensity of each expressing cell analyzed is measured, then summed to get the average intensity/cell. The 'total pool' is then arrived at by multiplying the number of molecules/cell (from SDS-PAGE) by the total number of cells analyzed. After that, then: 'to get the number of molecules within a Myo10 filopodium, the filopodium intensity was divided by the bioreplicate signal intensity and multiplied by 'total pool.' ' The meaning of this may seem simple or straightforward to the authors, but it's a bit confusing to understand what the 'bioreplicate signal intensity' is and then why it would be multiplied by the 'total pool'. This part is rather puzzling at first read.

      We agree, such information is critical. We have now revised this description with more precise terms and have included a formula on pg. 20.

      Since the approach described here leads the authors to their numerical estimates every effort should be made to have it be readily understood by all readers. A flow chart or diagram might be helpful. 

      We have added a diagram of the calculations to the supplemental material (Figure 1—figure supplement 3). We hope that both changes will make it easier for others to follow our work.

      (3) The distribution of Myo10 punctae around the cell are analyzed (Fig 2E, F) and the authors state that they detect 'periodic stretches of higher Myo10 density along the plasma membrane' (line 123) and also that there is correlation and anti-correlation of molecules and punctae at opposite ends of the cells. 

      In the first case, it is hard to know what the authors really mean by the phrase 'periodic stretches'. It's not easy to see a periodicity in the distribution of the punctae in the many cells shown in Supp Fig 3. Also, the correlation/anti-correlation is not so easily seen in the quantification shown in Fig 2F. Can the authors provide some support or clarification for what they are stating? 

      The periodic pattern that we refer to is most apparent in the middle panels of Fig. 2E, F. These panels show the density of Myo10 puncta. These puncta numbers closely correspond to filopodia counts, with the caveat that some filopodia might have multiple puncta. This periodic density might not be as apparent in the raw data shown in Supp. Fig. 3. We have therefore rewritten this paragraph to clarify our observations (pg. 6).

      (4) The authors are no doubt aware that a paper from the Tyska lab that employs a completely different method of counting molecules arrives at a much lower number of Myo10 molecules at the filopodial tip than is reported here was just posted (Fitz & Tyska, 2024, bioRxiv, DOI: 10.1101/2024.05.14.593924). 

      While it is not absolutely necessary for the authors to provide a detailed discussion of this new work given the timing, they may wish to consider adding a note briefly addressing it. 

      We are aware of this manuscript and that it uses a different approach for calibrating the fluorescence signal in microscopy. However, we are not comfortable commenting on that manuscript at this time, given that it has not yet been peer reviewed with the chance for author revisions.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      The manuscript the authors are now presenting does not comply with the formatting limits of a Short report, but it is instead presented as a full article type. I believe the authors could shorten the Discussion, and meet the criteria for a more appropriate Short Report format. 

      For instance, I continue to believe that the study of truncation variants could sustain the claim that membrane binding represents the driving force that leads to Myo10 accumulation. I understand the authors want to address these mechanisms in a follow-up story, for this reason, I encourage them to shorten the discussion, which seems unnecessarily long for a technique-based manuscript.

      In the first round of review, Reviewer 3 asked us to expand the discussion. Given that, we are happy with where we have landed on the length of the discussion.

      Figure 2, could include some images to facilitate the readers on the different messages of the two rose plots E and F, by picking one of the examples from the supplementary Figure 3 

      We have now added a supplemental figure showing an example cell (Fig. 2 figure supplement 2). But please note that the averaging of ~150 cells (Fig. 2E, F) should be more reliable to show these overall trends.

      Reviewer #2 (Recommendations For The Authors): 

      Also, the speculation about how fascin can outcompete Myo10 should include a mechanism for how the physiological levels of fascin can complete with the overabundance of Myo10 (page 10, lines 401-408). 

      As noted above, we have now clarified this point. 

      Reviewer #3 (Recommendations For The Authors): 

      line 495 - what is GOC? 

      We have now defined this oxygen scavenger system in the main text.

      lines 603/604 - it is stated that 'velocity analysis does not only account for Myo10 punctum that moved away from the starting point of the trajectory.' It's not clear what this really means. 

      The sentence now reads: "For Figure 4 parts G-H, note that velocity analysis includes a few Myo10 puncta that switch direction within a single trajectory (e.g., a retracting punctum that then elongates)."

      References #4 and #14 are the same. 

      Thank you for catching that; it has now been corrected.

      Fig 1C - the plot for signal intensity versus fmol of protein has numbers for the standard and then live and fixed cells. While the R2 value is quite good, it seems a bit odd that the three (?) data points for live cells are all quite small relative to the fixed cells and all bunched together at the left side of the plot. 

      As mentioned in the main text, the time post-transfection has a noticeable effect on the level of Myo10 expression. The three fixed-cell bioreplicates had higher Myo10 expression because they were analyzed 48 hours post-transfection compared to the three live-cell bioreplicates (24 hours). Therefore, the fixed cell data points are larger in value because they represent more molecules, and the live cell data points are on the left side of the plot because they represent fewer molecules.

    1. eLife assessment

      This manuscript reveals an important mechanism of KCNQ1/IKs channel gating and PUFA modulation of this mechanism. This mechanism is supported by convincing single channel recordings, macroscopic current recordings and mutational analyses. These findings are of importance to the ion channel field and possibly future therapeutic applications.

    2. Reviewer #1 (Public review):

      This study comes to an interesting conclusion: a polyunsaturated fatty acid, Lin-Glycine, increases the conductance of KCNQ1/KCNE1 channels by stabilizing a state of the selectivity filter that allows K+ conduction. The stabilization of a conducting state is well supported by single channel analysis, which shows that normally infrequent opening bursts occur more often in the presence of the PUFA. The linkage to PUFA action through the selectivity filter is supported by disruption of PUFA effects by mutation of residues which change conformation in two KCNQ1 structures from the literature. A definitive functional experiment is conducted by single channel recordings with selectivity filter domain mutation Y315F which ablates the Lin-Glycine effect on Gmax. The computational exploration of two selectivity filter structures proposed to interact distinctly with Lin-Glycine is informative. Both mutation results and simulations converge on the proposed selectivity filter mechanism, although other possibilities for Lin-Glycine binding and action might be possible. Overall, the major claim of the abstract is well-supported: "... that the selectivity filter in KCNQ1 is normally unstable ... and that the PUFA-induced increase in Gmax is caused by a stabilization of the selectivity filter in an open-conductive state."

    3. Reviewer #2 (Public review):

      Golluscio et al. address one of the mechanisms of IKs (KCNQ1/KCNE1) channel upregulation by polyunsaturated fatty acids (PUFAs). PUFAs are known to upregulate KCNQ1 and KCNQ1/KCNE1 channels through two mechanisms: one shifts the voltage dependence in a negative direction, and the other increases the maximum conductance (Gmax). While the first mechanism is known to affect the voltage sensor equilibrium through a charge effect, the second mechanism is less understood. Using single-channel recordings and mutagenesis at putative PUFA binding sites, they successfully demonstrate that the selectivity filter is stabilized in a conducting state by PUFA binding, and that this is the mechanism by which PUFAs increase Gmax. Their single-channel recordings are straightforward and clearly show that the selectivity filter tends to become conductive upon PUFA binding. Since PUFAs are potential therapeutic reagents for cardiac arrhythmias such as long QT syndrome, their findings are beneficial for future research and applications of these compounds.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript reveals an important mechanism of KCNQ1/IKs channel gating such that the open state of the pore is unstable and undergoes intermittent closed and open conformations. PUFA enhances the maximum open probability of IKs by binding to a crevice adjacent to the pore and stabilize the open conformation. This mechanism is supported by convincing single channel recordings that show empty and open channel traces and the ratio of such traces is affected by PUFA. In addition, mutations of the pore residues alter PUFA effects, convincingly supporting that PUFA alters the interactions among these pore residues.

      Strengths:

      The data are of high quality and the description is clear.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      The additional data included in this revision nicely strengthens the major claim.

      I apologize that my comment about K+ concentration in the prior review was unclear. The cryoEM structure of KCNQ1 with S4 in the resting state was obtained with lowered K+ relative to the active state. Throughout the results and discussion it seems implied that the change in voltage sensor state is somehow causative of the change in selectivity filter state while the paper that identified the structures attributes the change in selectivity filter state not to voltage sensors, but to the change in [K+] between the 2 structures. Unless there is a flaw in my understanding of the conditions in which the selectivity filter structures used in modeling were generated, it seems misleading to ignore the change in [K+] when referring to the activated vs resting or up vs down structures. My understanding is that the closed conformation adopted in the resting/low [K+] is similar to that observed in low [K+] previously and is more commonly associated with [K+]-dependent inactivation, not resulting from voltage sensor deactivation as implied here. The original article presenting the low [K+] structure also suggests this. When discussing conformational changes in the selectivity filter, I strongly suggest referring to these structures as activated/high [K+] vs resting/low [K+] or something similar, as the [K+] concentration is a salient variable.

      There seems to be some major confusion here and we will try to explain how we think. Note that in the Mandela and MacKinnon paper, there is no significant difference in the amino acid positions in the selectivity filter between low and high K+ when S4 is in the activated position (See Mandala and Mackinnon, PNAS Suppl. Fig S5 C and D). There are only fewer K+ in the selectivity filter in low K+. So, the structure with the distorted selectivity filter is not due to low K+ by itself. Note that there is no real difference between macroscopic currents recorded in low and high K+ solutions (except what is expected from changes in driving force) for KCNQ1/KCNE1 channels (Larsen et al., Bioph J 2011), suggesting that low K+ do not promote the non-conductive state (Figure 1). We now include a section in the Discussion about high/low K+ in the structures and the absence of effects of K+ on the function of KCNQ1/KCNE1 channels.

      Author response image 1.

      Macroscopic KCNQ1/KCNE1 currents recorded in different K+ conditions.  Note that there is no difference between current recorded in low K+ (2 mM) conditions and high (96 mM) K+ conditions (n=3 oocytes). Currents were normalized in respect to high K+.

      Note also that, in the previous version of the manuscript, we did not propose that the position of S4 is what determines the state of the selectivity filter. We only reported that the CryoEM structure with S4 resting shows a distorted selectivity filter. It seems like our text confused the reviewer to think that we proposed that S4 determines the state of the selectivity filter, when we did not propose this earlier. We previously did not want to speculate too much about this, but we have now included a section in the Discussion to make our view clear in light of the confusion of the reviewers.

      It is clear from our data that the majority of sweeps are empty (which we assume is with S4 up), suggesting that the selectivity filter can be (and is in the majority of sweeps) in the non-conducting state even with S4 up.  We think that the selectivity filter switches between a non-conductive and a conductive conformation both with S4 down and with S4 up. The cryoEM structure in low K+ and S4 down just happened to catch the non-conductive state of the selectivity filter.  We have now added a section in the Discussion to clarify all this and explain how we think it works.

      However, S4 in the active conformation seems to stabilize the conductive conformation of the selectivity filter, because during long pulses the channel seems to stay open once opened (See Suppl Fig S2). So, one possibility is that the selectivity filter goes more readily into the non-conductive state when S4 is down (and maybe, or not, low K+ plays a role) and then when S4 moves up the selectivity filter sometimes recovers into the conductive state and stays there. We now have included a section in the Discussion to present our view. Since this whole discussion was initiated and pushed by the reviewer, we hope that the reviewers will not demand more data to support these ideas. We think that this addition makes sense since other readers might have the same questions and ideas as the reviewer, and we would like to prevent any confusion about this topic.

      Figure 1

      It remains unclear in the manuscript itself what "control" refers to. Are control patched the same patches that later receive LG?

      Yes, the control means the same patch before LG. We now indicate that in legends and text throughout.

      Supplementary Figure S1

      Unclear if any changes occur after addition of LG in left panel and if the LG data on right is paired in any way to data on left.

      Yes, in all cases the left and right panel in all figures are from the same patch. We now indicate that in legends and text throughout.

      The letter p is used both to represent open probability open probability from the all-point amplitude histogram and as a p-value statistical probability indicator sometime lower case, sometimes upper case. This was confusing.

      We have now exclusively use lower case p for statistical probability and Po for open probability.

      "This indicates that mutations of residues in the more intracellular region of the selectivity filter do not affect the Gmax increases and that the interactions that stabilize the channel involve only residues located near the external region part of the selectivity filter. "

      Seems too strongly worded, it remains possible that mutations of other residues in the more intracellular region of the selectivity filter could affect the Gmax increases.

      We have changed the text to: "Mutations of residues in the more intracellular region of the selectivity filter do not affect the Gmax increases, as if the interactions that stabilize the channel involve residues located near the external region part of the selectivity filter. "

      Supplementary Figure S7

      Please report Boltzmann fit parameters. What are "normalized" uA?

      We removed the uA, which was mistakenly inserted. The lines in the graphs are just lines connecting the dots and not Boltzmann fits, since we don’t have saturating curves in all panels to make unique fits.

      "We have previously shown that the effects of PUFAs on IKs channels involve the binding of PUFAs to two independent sites." Was binding to the sites actually shown? Suggest changing to: "We have previously proposed models in which the effects of PUFAs..."

      We have now changed this as the Reviewer suggested: " We have previously proposed models in which the effects of PUFAs on IKs channels involve the binding of PUFAs to two independent sites."

      Statistics used not always clear. Methods refer to multiple statistical tests but it is not clear which is used when.

      We use two different tests and it is now explained in figure legends when either was used.

      n values confusing. Sometimes # of sweeps used as n. Sometimes # patches used as n. In one instance "The average current during the single channel sweeps was increased by 2.3 {plus minus} 0.33 times (n = 4 patches, p =0.0006)" ...this sems a low p value for this n=4 sample?

      We have now more clearly indicated what n stands for in each case. There was an extra 0 in the p value, so now it is p = 0.006. Thanks for catching that error.

      Reviewer #2 (Recommendations For The Authors):

      I still have some comments for the revised manuscript.

      (1) (From the previous minor point #6) Since D317E and T309S did not show statistical significance in Figure 5A, the sentences such as "This data shows that Y315 and D317 are necessary for the ability of Lin-Glycine to increase Gmax" or "the effect of Lin-Glycine on Gmax of the KCNQ1/KCNE1 mutant was noticeably reduced compared to the WT channel showing the this residue contributes to the Gmax effect (Figure 5A)." may need to be toned down. Alternatively, I suggest the authors refer to Supplementary Figure S7 to confirm that Y315 and D317 are critical for increasing Gmax.

      We have redone the analysis and statistical evaluation in Fig 5. We no use the more appropriate value of the fitted Gmax (which use the whole dose response curve instead of only the 20 mM value) in the statistical evaluation and now Y315F and D317E are statistically different from wt.

      (2) Supplementary Fig. S1. All control diary plots include the green arrows to indicate the timing of lin-glycine (LG) application. It is a bit confusing why they are included. Is it to show that LG application did not have an immediate effect? Are the LG-free plots not available?

      Not sure what the Reviewer is asking about? In the previous review round the Reviewers asked specifically for this. The arrow shows when LG was applied and the plot on the right shows the effect of LG from the same patch.

      (3) The legend to Supplementary Figure S4, "The side chain of residues ... are highlighted as sticks and colored based on the atomic displacement values, from white to blue to red on a scale of 0 to 9 Å." They look mostly blue (or light blue). Which one is colored white? It might be better to use a different color code. It would also be nice to link the color code to the colors of Supplementary Figure S5, which currently uses a single color.

      We have removed “from white to blue to red on a scale of 0 to 9 Å” and instead now include a color scale directly in Fig S4 to show how much each atom moved based on the color.

      We feel it is not necessary to include color in Fig S5 since the scale of how much each atom moves is shown on the y axis.

      (4) Add unit (pA) to the y-axis of Supplementary Figure S2.

      pA has been added.

      Reviewer #3 (Recommendations For The Authors):

      Some issues on how data support conclusions are identified. Further justifications are suggested.

      186: “The decrease in first latency is most likely due to an effect of Lin-Glycine on Site I in the VSD and related to the shift in voltage dependence caused by Lin-Glycine." The results in Fig S1B do not seem to support this statement since the mutation Y315F in the pore helix seemed to have eliminated the effect of Lin-Glycine in reducing first latency. The authors may want to show that a mutation that eliminating Site I would eliminate the effect of Lin-Glycine on first latency. On the other hand, it will be also interesting to examine if another pore mutation, such as P320L (Fig 5) also reduce the effect of Lin-Glycine on first latency.

      These experiments are very hard and laborious, and we feel these are outside the scope of this paper which focuses on Site II and the mechanism of increasing Gmax. Further studies of the voltage shift and latency will have to be for a future study.

      The mutation D317E did not affect the effect of Lin-Glycine on Gmax significantly (Fig 5A, and Fig S7F comparing with Fig S7A), but the authors conclude that D317 is important for Lin-Glycine association. This conclusion needs a better justification.

      We have redone the analysis and statistical evaluation in Fig 5. We no use the more appropriate value of the fitted Gmax (which use the whole dose response curve instead of only the 20 mM value) in the statistical evaluation and now D317E is statistically different from wt

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary - This study was designed to investigate changes in gene expression and associated chromatin accessibility patterns in spermatogonia in mice at different postnatal stages from pups to adults. The objective was to describe dynamic changes in these patterns that potentially correlate with functional changes in spermatogonia as a function of development and reproductive maturation. The potential utility of this information is to serve as a reference against which similar data from animals subjected to various disruptive environmental influences can be compared.

      Major Strengths and Weaknesses of the Methods and Results - A strength of the study is that it reviews previously published datasets describing gene expression and chromatin accessibility patterns in mouse spermatogonia. A weakness of the study is that it is not clear what new information is provided by the data provided that was not already known from previously published studies (see below). Specific weaknesses include the following:

      • Terminology - in the Abstract and first part of the Introduction the authors use the generic term "spermatogonial cells" in a manner that seems to be referring primarily to spermatogonial stem cells (SSCs) but initially ignores the well-known heterogeneity among spermatogonia - particularly the fact that only a small proportion of developing spermatogonia become SSCs - and ONLY those SSCs and NOT other developing spermatogonia - support steady-state spermatogenesis by retaining the capacity to either self-renew or contribute to the differentiating spermatogenic lineage throughout the male reproductive lifespan. The authors eventually mention other types of developing male germ cells, but their description of prospermatogonial stages that precede spermatogonial stages is deficient in that M-prospermatogonia - which occur after PGCs but before T1-prospermatogonia - are not mentioned. This description also seems to imply that all T2-prospermatogonia give rise to SSCs which is far from the case. It is the case that prospermatogonia give rise to spermatogonia, but only a very small proportion of undifferentiated spermatogonia form the foundational SSCs and ONLY SSCs possess the capacity to either self-renew or give rise to sequential waves of spermatogenesis.

      We thank Reviewer 1 for the comments and clarifications. As suggested in the previous revision, we use the term spermatogonial cells (SPGs) to make it clear that our cell preparations do not exclusively contain SSCs but all SPGs since they derive from a FACS enrichment strategy. This is explained in the manuscript. Further, we conducted deconvolution analyses on the datasets to examine the composition of the enriched SPGs preparations and provide new sequencing information confirming the presence of SSCs and differentiating SPGs.

      • Introduction - Statements regarding distinguishing transcriptional signatures in spermatogonia at different postnatal stages appear to refer to ALL subtypes of spermatogonia present at each stage collectively, thereby ignoring the well-known fact that there are distinct spermatogonial subtypes present at each postnatal stage and that some of those occur at certain stages but not at others. This brings into question the usefulness of the authors' discussion of what types of genes are expressed and/or what types of changes in chromatin accessibility are detected in spermatogonia at each stage.

      We agree that our data do not provide information about the transcriptional program of each subtype of SPGs. Rather they provide information about the dynamics of transcriptional programs in the transition from postnatal stage to adulthood in an enriched population of SPGs. The datasets are comprehensive and contain mRNA and non-coding RNA (with and without a polyA+ tail), which provides more precise transcriptomic information than classical single cell methods.

      • Methodology - The authors based recovery (enrichment) of spermatogonia from male pups on FACS sorting for THY1 and RMV-1. While sorting total testis cells for THY1+ cells does enrich for spermaogonia, this approach is now known to not be highly specific for spermatogonia (somatic cells are also recovered) and definitely not for SSCs. There are more effective means for isolating SSCs from total testis cells that have been validated by transplantation experiments (e.g. use of the Id4/eGFP transgene marker).

      We acknowledge the technical limitations of our enrichment strategy and made them clear in our revised manuscript.

      The authors then used "deconvolution" of bulk RNA-seq data in an attempt to discern spermatogonial subtype-specific transcriptomes. It is not clear why this is necessary or how it is beneficial given the availability of multiple single-cell RNA-seq datasets already published that accomplish this objective quite nicely - as the authors essentially acknowledge. Beyond this concern, a potential flaw with the deconvolution of bulk RNA-seq data is that this is a derivative approach that requires assumptions/computational manipulations of apparent mRNA abundance estimates that may confound interpretation of the relative abundance of different cellular subtypes within the hetergeneous cell population from which the bulk RNA-seq data is derived. Bottom line, it is not clear that this approach affords any experimental advantage over use of the publicly available scRNA-seq datasets and it is possible that attempts to employ this approach may be flawed yielding misleading data.

      The deconvolution analyses were necessary to address the question of the cell composition of our preparations raised by reviewers. These analyses were highly beneficial because they clarify the presence of different SPGs including SSCs in the samples. They are also advantageous because the datasets they are conducted upon have significantly higher sequencing coverage than published single cell datasets. They contain the full transcriptome and not just polyA+ transcripts as 10x datasets thus they provide considerably richer and more comprehensive transcriptomic information. This is very important to correctly interpret the results and to gain additional biological information. For the deconvolution analyses, we used state-of-the-art methods with proper computational controls for calibration. We selected published single-cell RNA-seq datasets of the highest quality. These analyses are extremely useful because they confirm the predominance of SSCs in the postnatal and adult cell samples and a minimal contamination by somatic cells. Our approach also provides a useful workflow that can easily be used by other researchers who cannot afford single-cell RNA-seq and allow them gain more information about the cellular composition of their samples. Finally, the execution of any computational analyses, including analyses of single-cell RNA-seq datasets requires to make assumptions during the development and the use of a method. The assumptions made for deconvolution analyses are not special in this respect and do not introduce more confounds than other methods. What is critical for such analyses is to include proper controls for calibration, which we carefully did and validated using our own previously published datasets for Sertoli cells.

      • Results & Discussion - In general, much of the information reported in this study is not novel. The authors' discussion of the makeup of various spermatogonial subtypes in the testis at various ages does not really add anything to what has been known for many years on the basis of classic morphological studies. Further, as noted above, the gene expression data provided by the authors on the basis of their deconvolution of bulk RNA-seq data does not add any novel information to what has been shown in recent years by multiple elegant scRNA-seq studies - and, in fact, as also noted above - represents an approach fraught with potential for misleading results. The potential value of the authors' report of "other cell types" not corresponding to major somatic cell types identified in earlier published studies seems quite limited given that they provide no follow-up data that might indicate the nature of these alternative cell types. Beyond this, much of the gene expression and chromatin accessibility data reported by the authors - by their own admission given the references they cite - is largely confirmatory of previously published results. Similarly, results of the authors' analyses of putative factor binding sites within regions of differentially accessible chromatin also appear to confirm previously reported results. Ultimately, it is not at all novel to note that changes in gene expression patterns are accompanied by changes in patterns of chromatin accessibility in either related promoters or enhancers. The discussion of these observations provided by the authors takes on more of a review nature than that of any sort of truly novel results. As a result, it is difficult to discern how the data reported in this manuscript advance the field in any sort of novel or useful way beyond providing a review of previously published studies on these topics.

      • Likely impact - The likely impact of this work is relatively low because, other than the value it provides as a review of previously published datasets, the new datasets provided are not novel and so do not advance the field in any significant manner.

      We acknowledge that much of the reported information is not novel but this is not necessarily a drawback as sequencing datasets on the same tissues or cells produced by different groups using comparable methods are common. This does not diminish the validity and usefulness of the datasets but rather enriches the respective fields as omics methods and data analyses can deliver different findings. Thus, our study cannot be criticized and disqualified because other datasets have been published but instead it should be acknowledged for providing high resolution full transcriptome information from different stages and adult of SCs that other studies do not provide. In this respect, the subjective nature of Reviewer 1’s statements is of concern. For instance, the statement: “…represents an approach fraught with potential for misleading results”. Such declaration suggests that all studies that previously used enrichment strategies are “fraught with potential for misleading results», which disqualifies the work of many colleagues. Further, this wrongly assumes that newer technologies are exempt of “potential for misleading results» which is not the case. Single-cell RNA-seq methods, extensively used to study SPGs, has been questioned for their limitation and potential biases due to low sequencing coverage, issues with transcript detection, low capture efficiency and higher degree of noise than bulk RNA datasets. Thus, caution is needed to interpret single-cell datasets on SPGs and these datasets also have their biases. For our datasets, we made major efforts to address the criticisms raised by the reviewer and reduce any potential misleading information by conducting additional analyses, by providing more details on the methods and enrichment strategy and by being careful with data interpretation. We would be grateful if these efforts could be acknowledged and the improvements on the manuscript and the value of the datasets be evaluated with objectivity.

      Reviewer #2 (Public Review):

      This revised manuscript attempts to explore the underlying chromatin accessibility landscape of spermatogonia from the developing and adult mouse testis. The key criticism of the first version of this manuscript was that bulk preparations of mixed populations of spermatogonia were used to generate the data that form the basis of the entire manuscript. To address this concern, the authors applied a deconvolution strategy (CIBERSORTx (Newman et al., 2019)) in an attempt to demonstrate that their multi-parameter FACS isolation (from Kubota 2004) of spermatogonia enriched for PLZF+ cells recovered spermatogonial stem cells (SSCs). PLZF (ZBTB16) protein is a transcription factor known to mark all or nearly all undifferentiated spermatogonia and some differentiating spermatogonia (KIT+ at the protein level) - see Niedenberger et al., 2015 (PMID: 25737569). The authors' deconvolution using single-cell transcriptomes produced at postnatal day 6 (P6) argue that 99% of the PLZF+ spermatogonia at P8 are SSCs, 85% at P15 and 93% in adults. Quite frankly given the established overlap between PLZF and KIT and known identity of spermatogonia at these developmental stages, this is impossible. Indeed - the authors' own analysis of the reference dataset demonstrates abundant PLZF mRNA in P6 progenitor spermatogonia - what is the authors' explanation for this observation? The same is essentially true in the use of adult references for celltype assignment. The authors found 63-82% of SSCs using this different definition of types (from a different dataset), begging the question of which of these results is true.

      For full transparency, we provided information about the deconvolution analyses for all libraries that use cell-type specific matrices generated from PND6 and adult single-cell RNA-seq reference datasets in our previous response (Fig1-3, response to reviewer 1). However, we don’t claim “that 99% of the PLZF+ spermatogonia at P8 are SSCs, 85% at P15 and 93% in adults”. Of these percentages, the ones that correspond to our postnatal libraries are the ones reported in our updated manuscript (Please see FigS2). Importantly, we never claimed that these percentages correspond to “PLZF+ spermatogonia», exclusively. Rather, they were inferred using gene expression-specific signature matrices (Fig1-c response to Reviewer 1 as example). As clearly evident in feature maps in FigS2 of our updated manuscript, the cellular population identified as SSCs using the dataset from Hermann et al., 2018 shows overlap for the expression of Ddx4, Zbtb16 (PLZF), Gfra1 and Id4 but minimal Kit. In agreement with the reviewer’s observation, progenitors also show a signal for Zbtb16 but have a different gene expression signature matrix (see Fig.1c and 2c for an example of gene signature matrices from PND6 and adult samples from the same publication).

      Regarding the question of which of these results are true, we observed that deconvolution analyses of our postnatal libraries using two different single-cell postnatal RNA-seq reference datasets consistently suggest a high contribution (>90%) by SSCs (defined using cell-specific expression matrices following identification of cell-types that match the closest ones reported by each study (See FigS2 updated manuscript). The analyses of our adult libraries using published adult datasets from the same group (Hermann et al., 2018; Fig1 response to Reviewer 1 and FigS2 updated manuscript) suggest that the contribution of adult SSCs to the cell population is lower than at postnatal stages, but SSCs still are the most abundant cell stage identified in our libraries (FigS2g). We reported these analyses and acknowledge that in our adult samples, we also likely have differentiating SPGs.

      In their rebuttal, the authors also raise a fair point about the precision of differential gene expression among spermatogonial subsets. At the mRNA level, Kit is definitely detectable in undifferentiated spermatogonia, but it is never observed at the protein level until progenitors respond to retinoic acid (see Hermann et al., 2015). I agree with the authors that the mRNAs for "cell type markers" are rarely differentially abundant at absolute levels (0 or 1), but instead, there are a multitude of shades of grey in mRNA abundance that "separate" cell types, particularly in the male germline and among the highly related spermatogonial subtypes of interest (SSCs, progenitor spermatogonia and differentiating spermatogonia). That is, spermatogonial biology should be considered as a continuous variable (not categorical), so examining specific cell populations with defined phenotypes (markers, function) likely oversimplifies the underlying heterogeneity in the male germ lineage. But, here, the authors have ignored this heterogeneity entirely by selecting complex populations and examining them in aggregate. We already know that PLZF protein marks a wide range of spermatogonia, complicating the interpretation of aggregate results emerging from such samples. In their rebuttal, the authors nicely demonstrate the existence of these mixtures using deconvolution estimation. What remains a mystery is why the authors did not choose to perform single-cell multiome (RNA-seq + ATAC-seq) to validate their results and provide high-confidence outcomes. This is an accessible technique and was requested after the initial version, but essentially ignored by the authors.

      We agree with the reviewer that the male germ lineage should be considered as a continuous variable and that examining specific cell populations with defined features oversimplifies its heterogeneity. Regarding the use of single-cell multiome (RNA-seq + ATAC-seq), we also agree that this technology can provide additional insight by integrating RNA and chromatin accessibility in the same cells. However, it is an refined method that is expensive, time consuming and requires human resources that are beyond our capacity for this project.

      A separate question is whether these data are novel. A prior publication by the Griswold lab (Schleif et al., 2023; PMID: 36983846) already performed ATAC-seq (and prior data exist for RNA-seq) from germ cells isolated from synchronized testes. These existing data are higher resolution than those provided in the current manuscript because they examine germ cells before and after RA-induced differentiation, which the authors do not base on their selection methods. Another prior publication from the Namekawa lab extensively examined the transcriptome and epigenome in adult testes (Maezawa et al., 2000; PMID: 32895557; and several prior papers). The authors should explain how their results extend our knowledge of spermatogonial biology in light of the preceding reports.

      Our data do extend previous studies because they provide high-resolution transcriptomic (full transcriptome) and chromatin accessibility profiling in postnatal and adult stages. They now also provide an approach for deconvolution analyses of bulk RNA datasets that can be of use to the community. Novelty in the field of omics is usually not a prime feature and it is common that datasets on the same tissues or cells be published by different groups using comparable methods and analyses.

      The authors are also encouraged to improve their use of terminology to describe the samples of interest. The mitotic male germ cells in the testis are called spermatogonia (not spermatogonial cells, because spermatogonia are cells). Spermatogonia arise from Prospermatogonia. Spermatogonia are divisible into two broad groups: undifferentiated spermatogonia (comprised of few spermatogonial stem cells or SSCs and many more progenitor spermatogonia - at roughly 1:10 ratio) and differentiating spermatogonia that have responded to RA. The authors also improperly indicate that SSCs directly produce differentiating spermatogonia - indeed, SSCs produce transit-amplifying progenitor spermatogonia, which subsequently differentiate in response to retinoic acid stimulation. Further, the use of Spermatogonial cells (and SPGs) is imprecise because these terms do not indicate which spermatogonia are in question. Moreover, there have been studies in the literature which have used similar terms inappropriately to refer to SSCs, including in culture. A correct description of the lineage and disambiguation by careful definition and rigorous cell type identification would benefit the reader.

      Overall, my concern from the initial version of this manuscript stands - critical methodological flaws prevent interpretation of the results and the data are not novel. Readers should take note that results in essentially all Figures do not reflect the biology of any one type of spermatogonium.

      We revised and improved the terminology wherever possible and also considering requests from other reviewers about terminology.

      Reviewer #3 (Public Review):

      In this study, Lazar-Contes and colleagues aimed to determine whether chromatin accessibility changes in the spermatogonial population during different phases postnatal mammalian testis development. Because actions of the spermatogonial population set the foundation for continual and robust spermatogenesis and the gene networks regulating their biology are undefined, the goal of the study has merit. To advance knowledge, the authors used mice as a model and isolated spermatogonia from three different postnatal developmental age points using cell sorting methodology that was based on cell surface markers reported in previous studies and then performed bulk RNA-sequencing and ATAC-sequencing. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics seems sound but there are several concerns with the cell population isolated from testes and lack of acknowledgement for previous studies that have also performed ATAC-sequencing on spermatogonia of mouse and human testes. The limitations, described below, call into question validity of the interpretations and reduce the potential merit of the findings.

      I suggest changing the acronym for spermatogonial cells from SC to SPG for two reasons. First, SPG is the commonly used acronym in the field of mammalian spermatogenesis. Second, SC is commonly used for Sertoli Cells.

      This was suggested in the previous review by Reviewer 1 and was modified in the revised version of the manuscript.

      The authors should provide a rationale for why they used postnatal day 8 and 15 mice. The FACS sorting approach used was based on cell surface proteins that are not germline specific so there was undoubtedly somatic cells in the samples used for both RNA and ATAC sequencing. Thus, it is essential to demonstrate the level of both germ cell and undifferentiated spermatogonial enrichment in the isolated and profiled cell populations. To achieve this, the authors used PLZF as a biomarker of undifferentiated spermatogonia. Although PLZF is indeed expressed by undifferentiated spermatogonia, there have been several studies demonstrating that expression extends into differentiating spermatogonia. In addition, PLZF is not germ cell specific and single cell RNA-seq analyses of testicular tissue has revealed that there are somatic cell populations that express Plzf, at least at the mRNA level. For these reasons, I suggest that the authors assess the isolated cell populations using a germ cell specific biomarker such as DDX4 in combination with PLZF to get a more accurate assessment of the undifferentiated spermatogonial composition. This assessment is essential for interpretation of the RNA-seq and ATAC-seq data that was generated.

      A previous study by the Namekawa lab (PMID: 29126117) performed ATAC-seq on a similar cell population (THY1+ FACS sorted) that was isolated from pre-pubertal mouse testes. It was surprising to not see this study referenced to in the current manuscript. In addition, it seems prudent to cross-reference the two ATAC-seq datasets for commonalities and differences. In addition, there are several published studies on scATAC-seq of human spermatogonia that might be of interest to cross-reference with the ATAC-seq data presented in the current study to provide an understanding of translational merit for the findings.

      These points have been addressed in our previous response and in the revised manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Weaknesses:

      There appears to be a lack of basic knowledge of the process of spermatogenesis. For instance, the statement that "During the first week of postnatal life, a population of SCs continues to proliferate to give rise to undifferentiated Asingle (As), Apaired (Apr) and Aaligned (Aal) cells. The remaining SCs differentiate to form chains of daughter cells that become primary and secondary spermatocytes around postnatal day (PND) 10 to 12." is inaccurate. The Aal cells are the spermatogonial chains, the two are not distinct from one another. In addition, the authors fail to mention spermatogonial stem cells which form the basis for steady-state spermatogenesis. The authors also do not acknowledge the well-known fact that, in the mouse, the first wave of spermatogenesis is distinct from subsequent waves. Finally, the authors do not mention the presence of both undifferentiated spermatogonia (aka - type A) and differentiating spermatogonia (aka - type B). The premise for the study they present appears to be the implication that little is known about the dynamics of chromatin during the development of spermatogonia. However, there are published studies on this topic that have already provided much of the information that is presented in the current manuscript.

      Regarding the inaccuracy and incompleteness of some of the statements about spermatogonial cells and spermatogenesis. In the Introduction, we replaced the following statement: "During the first week of postnatal life, a population of SCs continues to proliferate to give rise to undifferentiated Asingle (As), Apaired (Apr) and Aaligned (Aal) cells. The remaining SCs differentiate to form chains of daughter cells that become primary and secondary spermatocytes around postnatal day (PND) 10 to 12." by: “Spermatogonial cells (SPGs) are the initiators and supporting cellular foundation of spermatogenesis in testis in many species, including mammals. In the mammalian testis, the founding germ cells are primordial germ cells (PGCs), which give rise sequentially to different populations of SPGs : primary transitional (T1)-prospermatogonia (ProSG), secondary transitional (T2)-ProSG, and then spermatogonial stem cells (SSCs) (McCarrey, 2013; Rabbani et al., 2022; Tan et al., 2020). The ProSG population is exhausted by postnatal day (PND) 5 (Drumond et al., 2011) and by PND6-8, distinct SPGs subtypes can be distinguished on the basis of specific marker proteins and regenerative capacity (Cheng et al., 2020; Ernst et al., 2019; Green et al., 2018; Hermann et al., 2018; Tan et al., 2020).

      SSCs represent an undifferentiated population of SPGs that retain regenerative capacity and divide to either self-renew or generate progenitors that initiate spermatogenic differentiation, giving rise to differentiating SPGs (diff-SPGs ). Diff-SPGs form chains of daughter cells that become primary and secondary spermatocytes around PND10 to 12. Spermatocytes then undergo meiosis and give rise to haploid spermatids that develop into spermatozoa. Spermatozoa are then released into the lumen of seminiferous tubules and continue to mature in the epididymis until becoming capable of fertilization by PND42-48 in mice  (Kubota and Brinster, 2018; Rooij, 2017).”

      Regarding the premise and implications of our findings. We clarified the premise of our finding in the revised manuscript. The following statement was included in the Discussion: "our findings complement existing datasets on spermatogonial cells by providing parallel transcriptomic and chromatin accessibility maps at high resolution from the same cell populations at early postnatal, late postnatal and adult stages collected from single individuals (for adults)".  

      It is not clear which spermatogonial subtype the authors intended to profile with their analyses. On the one hand, they used PLZF to FACS sort cells. This typically enriches for undifferentiated spermatogonia. On the other hand, they report detection in the sorted population of markers such as c-KIT which is a well-known marker of differentiating spermatogonia, and that is in the same population in which ID4, a well-known marker of spermatogonial stem cells, was detected. The authors cite multiple previously published studies of gene expression during spermatogenesis, including studies of gene expression in spermatogonia. It is not at all clear what the authors' data adds to the previously available data on this subject.

      The authors analyzed cells recovered at PND 8 and 15 and compared those to cells recovered from the adult testis. The PND 8 and 15 cells would be from the initial wave of spermatogenesis whereas those from the adult testis would represent steady-state spermatogenesis. However, as noted above, there appears to be a lack of awareness of the well-established differences between spermatogenesis occurring at each of these stages.

      We applied computational deconvolution to our bulk RNA-seq datasets, employing publicly available single-cell RNA-seq datasets, to estimate and identify cellular composition. Trained on high-quality RNA-seq datasets from pure or single-cell populations, deconvolution algorithms create expression matrices reflecting the cellular diversity in reference datasets. These cell-type-specific expression matrices are subsequently used to determine the cellular composition of bulk RNA-seq samples with unknown cellular components (Cobos et al., 2023).

      For our analysis, we chose CIBERSORTx (Newman et al., 2019), recognized as the most advanced deconvolution algorithm to date, employing it with three high-quality, publicly available single-cell RNA-seq datasets. First, we assessed the cellular composition of all our RNA-seq libraries, using datasets generated by (Hermann et al., 2018) which characterized the single-cell transcriptomes of testicular cells and various populations of spermatogonial progenitor cells (SPGs) in early postnatal (PND6) and adult stages. This enabled us to not only address potential somatic cell contamination but also to analyse the composition of isolated SPGs using a unified dataset source.

      Author response image 1.

      Deconvolution analysis of bulk RNA-seq samples using PND6 single-cell RNA seq from Hermann et al, 2018 a. Seurat clusters from PND6 single-cell RNA-seq. b. Feature maps of gene expression for markers of SPGs and somatic cells. c. Gene expression signature matrix from PND6  single-cell RNA-seq datasets. d. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. e. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study.

      By re-analyzing the single-cell RNA-seq datasets, we identified distinct cell-type clusters, marked by specific cellular markers as reported in the original and subsequent studies (Author response image 1a,b and Author response image 2a,b). Then, CIBERSORTx generated gene-expression signature matrices and estimated the cell-type proportions within our 18 bulk RNA-seq libraries. Evaluation of our postnatal libraries (PND8 and 15) against a PND6 signature matrix revealed a predominant derivation from SPGs, with average estimated proportions of spermatogonial stem cells (SSCs) being 0.99 and 0.85 for PND8 and PND15 samples, respectively (Author response image 1c-e). Notably, the analysis of PND15 libraries also suggested the presence of additional SPGs types, including progenitors and differentiating SPGs (Author response image 1d), albeit at lower frequency. 

      Similarly, evaluation of our adult RNA-seq libraries, using an adult signature matrix, showed an average SSC proportion of 0.82, indicating a primary derivation from SSC cells. Consistent with the findings from PND15 libraries, our deconvolution analysis also suggests the presence of additional SPG types, including progenitors and differentiating SPGs (Author response image 1d). However, unlike our early and late postnatal stage libraries, the deconvolution analysis of adult libraries indicated the presence of other cell types (labeled "Other"), not corresponding to the major somatic cell types identified by Hermann et al. 2018. The estimated average proportion of these cells was less than 0.05 in two adult libraries and 0.10 in the others. This variance in cellular composition underlines the deconvolution method's effectiveness in dissecting complex cellular compositions in bulk RNA-seq samples.

      Author response image 2.

      Deconvolution analysis of bulk RNA-seq samples using Adult single-cell RNA seq (Hermann et al, 2018) a. Seurat clusters from Adult single-cell RNA-seq. b. Feature maps of gene expression for markers of SPG and somatic cells. c. Gene expression signature matrix from Adult single-cell RNA-seq datasets. d. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. e. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study.

      To further validate our observations, we re-analyzed two additional testicular single-cell RNA-seq datasets derived from an early postnatal stage (PND7) (Tan et al., 2020) and adult (Green et al., 2018) (Author response image 3a,b and Author response image 4a,b). We identified distinct cell-type clusters, marked by specific cellular markers (Author response image 3a,b and Author response image 4a,b), and proceeded with the deconvolution analysis using CIBERSORTx. Evaluation of our postnatal libraries (PND8 and 15) against the PND7 signature matrix from Tan et al., 2020 confirmed a derivation from germ cells (Author response image 3d,e), in particular from SSCs (Author response image 3g), with average estimated proportions of SSCs being 0.93 and 0.86 for PND8 and PND15 samples, respectively, and the rest estimated to be in origin from differentiating SPGs (Author response image 3g,h). In the case of the adult samples, evaluation against the adult signature matrix from Green et al., 2018 confirmed a predominant derivation from SSCs, with average estimated proportions of SSCs being 0.79, consistent with the 0.82 estimated proportion from Hermann et al., 2018. 

      Author response image 3.

      Deconvolution analysis of bulk RNA-seq samples with additional single-cell datasets. Seurat clusters from PND7 single-cell RNA-seq (Tang 2020). b. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. c. Dotplot of the average estimated proportion of germ cells in all bulk RNA-seq libraries reported in this study. d. Re-clustering of germ cell cluster shown in a. e. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. f. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study. g. Seurat clusters from adult single-cell RNA-seq (Green et al., 2018). h. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. i. Dotplot of the average estimated proportion of germ cells in all bulk RNA-seq libraries reported in this study.

      To further validate our deconvolution strategy, we interrogated the cellular composition of bulk RNA-seq libraries derived from cellular populations enriched in Sertoli cells, generated by our group using a similar enrichment/sorting strategy (Thumfart et al., 2022). As expected, our results show that all our libraries are mainly composed of Sertoli cells suggesting that the deconvolution strategy employed is accurate in detecting cell-type composition (Author response image 4).

      Author response image 4.

      Deconvolution analysis of Sertoli bulk RNA-seq samples. Barplots of estimated cellular proportions for bulk RNAseq libraries reported in Thumfart et al., 2022. Expression matrices were derived from the analysis of single-cell RNA-seq datasets used to asses cellular composition of the SPGs bulk libraries.

      Author response image 5.

      Id4 and Kit are transcribed in SSCs. Seurat clusters from PND6 single-cell RNA-seq (left) and feature maps of gene expression for Id4 (center) and Kit (right). Zoom in into SSCs (red).

      Finally, regarding the following observation by the reviewer: "On the other hand, they report detection in the sorted population of markers such as c-KIT which is a well-known marker of differentiating spermatogonia, and that is in the same population in which ID4, a well-known marker of spermatogonial stem cells, was detected." It was recently shown using single-cell RNA that “nearly all differentiating spermatogonia at P3 (delineated as c-KIT+) are ID4-eGFP” (Law et al., 2019).  While this finding does not exclude the fact that we have a mixture of SPGs cells, this finding supports the possibility that SPG cells express both markers of undifferentiated and differentiated cells, particularly in the early stages of postnatal development. Indeed, we observe that some cells labeled as SSC show signals for both Id4 and Kit in single-cell RNA-seq data from Hermann et al., 2018 (Author response image 5).

      Therefore, the results from the deconvolution analysis and our immunofluorescence data showing 85-95% PLZF+  cells in our cellular preparations underscore that our bulk RNA-seq libraries are mainly composed of SPGs. The deconvolution analysis also suggests a predominantly cellular composition of SSCs and to a lesser degree of differentiating SPGs. Our adult RNA-seq libraries show a small proportion of somatic cells (<0.10). 

      In the revised manuscript, we compiled the deconvolution analyses and present them in a condensed version in Supplementary Fig 2. 

      In general, the authors present observational data of the sort that is generated by RNA-seq and ATAC-seq analyses, and they speculate on the potential significance of several of these observations. However, they provide no definitive data to support any of their speculations. This further illustrates the fact that this study contributes little if any new information beyond that already available from the numerous previously published RNA-seq and ATAC-seq studies of spermatogenesis. In short, the study described in this manuscript does not advance the field.

      We acknowledge that RNA-seq and ATAC-seq datasets like ours are observational and that their interpretation can be speculative. Nevertheless, our datasets represent an additional useful resource for the community because they are comprehensive and high resolution, and can be exploited for instance, for studies in environmental epigenetics and epigenetic inheritance examining the immediate and long-term effects of postnatal exposure and their dynamics. The depth of our RNA sequencing allowed detect transcripts with a high dynamic range, which has been limited with classical RNA sequencing analyses of spermatogonial cells and with single-cell analyses (which have comparatively low coverage). Further, our experimental pipeline is affordable (more than single cell sequencing approaches) and in the case of adults, provides data per animal informing on the intrinsic variability in transcriptional and chromatin regulation across males. These points will be discussed in the revised manuscript.

      In general, the authors present observational data of the sort that is generated by RNA-seq and ATAC-seq analyses, and they speculate on the potential significance of several of these observations. However, they provide no definitive data to support any of their speculations. This further illustrates the fact that this study contributes little if any new information beyond that already available from the numerous previously published RNA-seq and ATAC-seq studies of spermatogenesis. In short, the study described in this manuscript does not advance the field.

      Relevant information for both points was included in the Discussion of the revised manuscript.  

      The phenomenon of epigenetic priming is discussed, but then it seems that there is some expression of surprise that the data demonstrate what this reviewer would argue are examples of that phenomenon. The authors discuss the "modest correspondence between transcription and chromatin accessibility in SCs." Chromatin accessibility is an example of an epigenetic parameter associated with the primed state. The primed state is not fully equivalent to the actively expressing state. It appears that certain histone modifications along with transcription factors are critical to the transition between the primed and actively expressing states (in either direction). The cell types that were investigated in this study are closely related spermatogenic, and predominantly spermatogonial cell types. It is very likely that the differentially expressed loci will be primed in both the early (PND 8 or 15) and adult stages, even though those genes are differentially expressed at those stages. Thus, it is not surprising that there is not a strict concordance between +/- chromatin accessibility and +/- active or elevated expression.

      Relevant information was included in the Discussion of the revised manuscript.

      Reviewer #2:

      The objective of this study from Lazar-Contes et al. is to examine chromatin accessibility changes in "spermatogonial cells" (SCs) across testis development. Exactly what SCs are, however, remains a mystery. The authors mention in the abstract that SCs are undifferentiated male germ cells and have self-renewal and differentiation activity, which would be true for Spermatogonial STEM Cells (SSCs), a very small subset of total spermatogonia, but then the methods they use to retrieve such cells using antibodies that enrich for undifferentiated spermatogonia encompass both undifferentiated and differentiating spermatogonia. Data in Fig. 1B prove that most (85-95%) are PLZF+, but PLZF is known to be expressed both by undifferentiated and differentiating (KIT+) spermatogonia (Niedenberger et al., 2015; PMID: 25737569). Thus, the bulk RNA-seq and ATAC-seq data arising from these cells constitute the aggregate results comprising the phenotype of a highly heterogeneous mixture of spermatogonia (plus contaminating somatic cells), NOT SSCs. Indeed, Fig. 1C demonstrates this by showing the detection of Kit mRNA (a well-known marker of differentiating spermatogonia - which the authors claim on line 89 is a marker of SCs!), along with the detection of markers of various somatic cell populations (albeit at lower levels).

      The reviewer is correct that our spermatogonial cell populations are mixed and include undifferentiated and differentiated cells, hence the name of spermatogonia (SCs), and probably also contains some somatic cells. We acknowledge that this is a limitation of our isolation approach. To circumvent this limitation, we will conduct in silico deconvolution analysis using publicly available single-cell RNA sequencing datasets to obtain information about markers corresponding to undifferentiated and differentiated spermatogonia cells, and somatic cells. These additional analyses will provide information about the cellular composition of the samples and clarify the representation of undifferentiated and differentiated spermatogonial cells and other cells.

      This admixture problem influences the results - the authors show ATAC-seq accessibility traces for several genes in Fig. 2E (exhibiting differences between P15 and Adult), including Ihh, which is not expressed by spermatogenic cells, and Col6a1, which is expressed by peritubular myoid cells. Thus, the methods in this paper are fundamentally flawed, which precludes drawing any firm conclusions from the data about changes in chromatin accessibility among spermatogonia (SCs?) across postnatal testis development.

      The reviewer raises concern about the lack of correspondence between chromatin accessibility and expression observed for some genes, arguing that this precludes drawing firm conclusions. However, a dissociation between chromatin accessibility and gene expression is normal and expected since chromatin accessibility is only a readout of protein deposition and occupancy e.g. by transcription factors, chromatin regulators, or nucleosomes, at specific genomic loci that does not give functional information of whether there is ongoing transcriptional activity or not. A gene that is repressed or poised for expression can still show a clear signal of chromatin accessibility at regulatory elements. The dissociation between chromatin accessibility and transcription has been reported in many different cells and conditions (PMID: 36069349, PMID: 33098772) including in spermatogonial cells (PMID: 28985528) and in gonads in different species (PMID: 36323261). Therefore, the dissociation between accessibility and transcription is not a reason to conclude that our data are flawed.

      In addition, there already are numerous scRNA-seq datasets from mouse spermatogenic cells at the same developmental stages in question.

      This is true but full transcriptomic profiling like ours on cell populations provides different transcriptional information that is deeper and more comprehensive. Our datasets identified >17,000 genes while scRNA-seq typically identifies a few thousand of genes. Our analyses also identified full-length transcripts, variants, isoforms, and low abundance transcripts. These datasets are therefore a valuable addition to existing scRNAseq.

      Moreover, several groups have used bulk ATAC-seq to profile enriched populations of spermatogonia, including from synchronized spermatogenesis which reflects a high degree of purity (see Maezawa et al., 2018 PMID: 29126117 and Schlief et al., 2023 PMID: 36983846 and in cultured spermatogonia - Suen et al., 2022 PMID: 36509798) - so this topic has already begun to be examined. None of these papers was cited, so it appears the authors were unaware of this work.

      We apologize for not mentioning these studies in our manuscript, we will do so in the revised version.

      The authors' methodological choice is even more surprising given the wealth of single-cell evidence in the literature since 2018 demonstrating the exceptional heterogeneity among spermatogonia at these developmental stages (the authors DID cite some of these papers, so they are aware). Indeed, it is currently possible to perform concurrent scATAC-seq and scRNA-seq (10x Genomics Multiome), which would have made these data quite useful and robust. As it stands, given the lack of novelty and critical methodological flaws, readers should be cautioned that there is little new information to be learned about spermatogenesis from this study, and in fact, the data in Figures 2-5 may lead readers astray because they do not reflect the biology of any one type of male germ cell. Indeed, not only do these data not add to our understanding of spermatogonial development, but they are damaging to the field if their source and identity are properly understood. Here are some specific examples of the problems with these data:

      Fig. 2D - Gata4 and Lhcgr are not expressed by germ cells in the testis.

      Fig. 3A - WT1 is expressed by Sertoli cells, so the change in accessibility of regions containing a WT1 motif suggests differential contamination with Sertoli cells. Since Wt1 mRNA was differentially high in P15 (Fig. 3B) - this seems to be the most likely explanation for the results. How was this excluded?

      Fig. 3D - Since Dmrt1 is expressed by Sertoli cells, the "downregulation" likely represents a reduction in Sertoli cell contamination in the adult, like the point above. Did the authors consider this?

      Regarding concerns about contamination by somatic cells (Transcription). In addition to the results of our deconvolution analysis (see response to Reviewer #1), we addressed the specific concern of the paradoxical expression of genes considered markers of somatic cells in the testis. For instance, we plotted the expression values of Ihh, Lhcgr, Gata4, Col16a, Wt1, and Dmrt1 along with the expression values of Ddx4 and Zbtb16. We observe that the expression level of Ddx4 and Zbtb16, genes expressed predominantly in SPGs, is orders of magnitude higher than the one observed for the rest of the genes with the notable exception of Dmrt1 which is also highly expressed (Fig.6). Indeed, our analysis of publicly available single-cell RNA-seq datasets shows that Dmrt1 is robustly expressed in germ cells (Author response image 7), and as also noted by the reviewer, in Sertoli cells in postnatal stages. Notably, we observe a significant stepwise decrease in the expression of Dmrt1 across the postnatal maturation of SPG cells. This is highly unlikely to be a result of major contamination by Sertoli cells of just our postnatal libraries. We based this statement on three observations. First, the deconvolution analysis of all our RNA-seq libraries using four different expression signature matrices from high-quality single-cell RNAseq from testis showed that our libraries are largely derived from SPGs. Second, the evaluation of our adult libraries with the PND6 signature matrix from Green et al., 2018 suggested that the proportion of Sertoli cells in our adult libraries, if any, would be higher than in our postnatal libraries (Author response image 3d, blue bars). This makes it unlikely that the observed decrease in expression of Dmrt1 in adult samples is due to prominent somatic contamination of the postnatal libraries. Third, the step-wise decrease in Dmrt1 expression seems to correlate with progression during postnatal development (Author response image 7) as feature maps of Dmrt1 expression derived from public single-cell RNA-seq experiments show a reduction in expression in adult SPGs in comparison with early postnatal stages (Author response image 7 last two panels). Then, the observed effects are likely the result of developmental gene regulatory processes that operate during the developmental maturation of SPGs. 

      Author response image 6.

      Expression of germ and somatic cell markers in our RNA-seq datasets. Boxplots of log2(CPM) (Top) and CPM (Bottom) values for selected genes from our RNAseq datasets. Each point in boxplots represent the expression value of a biological replicate.

      Author response image 7.

      Expression of germ and somatic cell markers in publicly available single-cell RNA-seq datasets. Seurat clusters from all analyzed single-cell RNA-seq datasets (first column from left) and feature maps of gene expression for Zbtb16, Dmrt1 and Wt1.

      Consistent with the reviewer’s observation, Ihh is not expressed in germ cells and indeed we do not detect signal at this locus nor Lhcgr. Furthermore, while we indeed observe a significant increase in the expression of Wt1 in PND15 samples, its expression level is considerably lower than that of SPG markers. This is even more evident when plotting expression data in a linear scale rather than as a log2 transformation of the expression values. Whether such transcriptional profiles reflect developmentally regulated transcription, stochastic effects on gene expression, or potential somatic contamination is difficult to determine. However, based on our deconvolution data we believe it is unlikely that major contamination could account for our observations. 

      Notably, while Wt1 is robustly expressed in nearly all Sertoli cells across postnatal development (Author response image 7), it is also detected in other cell types including SPGs -although in fewer cells and with lower expression levels-, consistent with our observations (Author response image 6 and 8). Therefore, the assignment of a gene as a marker of a particular cell type does not imply that such a gene is expressed uniquely in such cell, rather it is expressed in more cells and likely at higher levels. 

      Author response image 8.

      Expression of Wt1 in publicly available single-cell RNA-seq datasets. Feature maps of gene expression for Wt1. In dashed boxes, a zoom-in into germ cells cluster that show expression of Wt1 at some of these cells.

      Regarding concerns about contamination by somatic cells (chromatin accessibility). In Figure 2 of our manuscript, we show the chromatin accessibility landscape of different genes, including genes either not expressed in testicular cells (Ihh) and those believed to be expressed exclusively in somatic cells (Lhcgr, Gata4, Col16a1, Wt1). For some of these genes, we reported changes in chromatin accessibility at specific sites between PND15 and adults (e.g. Wt1 and Col16a1). The observation of "traces of chromatin accessibility" at these loci and the reported changes in accessibility raised concerns of potential contamination which "fundamentally flaw" our results, as stated by the reviewer. While we acknowledge that all enrichment methods have a margin of potential contamination, we fundamentally disagree with the reviewer's observations. 

      The term chromatin accessibility can be misleading. In principle, the term accessibility might suggest the literal lack of protein deposition at a given place in the genome. Rather, chromatin accessibility as evaluated by ATAC- seq (as in this case) must be interpreted as a measure of protein occupancy genome-wide (PMID: 30675018). Depending on the type of fragments analyzed we can obtain information regarding the occupancy of transcription factors (TFs), nucleosomes, and other chromatin-associated proteins that are present at genomic locations at a given time within a population of cells. The detection of chromatin accessibility at a given locus does not necessarily indicate transcription of the gene in a given cell type. A gene can be repressed or poised for expression and still show a clear signal of chromatin accessibility at its regulatory elements or along the gene body. For instance, in agreement with the reviewer's observation, neither Ihh nor Lhcgr is expressed in our datasets (Author response image 6 and Author response image 9), however, they show a distinctive pattern of chromatin accessibility in our datasets and publicly available ATAC-seq data derived from undifferentiated (Id4bright) and differentiating SPGs (Id4-dim) (Cheng et al., 2020) (Author response image 9). A similar argument can be applied regarding other loci such as Wt1 and Col6a1 for which we also observe extremely low levels of transcription. Therefore, the lack of transcription does not exclude that these loci display clear patterns of chromatin accessibility (Author response image 9). Notably, while traces of  chromatin accessibility can also be observed in ATAC-seq datasets from embryonic Sertoli cells (Garcia-Moreno et al., 2019) and other somatic stem cells (hematopoietic stem cells; HSCs) (Xiang et al., 2020) (Author response image 9), the pattern of chromatin accessibility markedly differs with that observed in SPG cells. Therefore, the observed changes in chromatin accessibility are unlikely to result from contaminating somatic cells.

      To strengthen our observation, we identified regions of chromatin accessibility in SPGs, Sertoli, and HSCs using both our datasets and publicly available ATAC-seq datasets. Overlap analysis revealed at least four groups of ATAC-seq peaks: 1) peaks shared among all analyzed cell types, 2)peaks shared just among SPG cells, 3) peaks specific to Sertoli cells and 4) peaks specific to HSCs (Author response image 10). Peaks shared among all tested cell-types are predominantly located at promoters of genes involved in translation and DNA replication (GO analysis adj p-value<0.05). In contrast, cell-type specific peaks are localized at intergenic and intragenic regions, suggesting localization at enhancer elements (Author response image 10). Indeed, GO analysis of cell-type specific peaks revealed enrichment for genes involved in male meiosis for SPGs, vesicle-mediated transport for Sertoli cells and in immune system process for HSCs, consistent with cell-type specific functions. If contamination by somatic cells, such as Sertoli cells, would be prominent as stated by the reviewer, we would expect to observe prominent ATAC-seq signal from our datasets at peaks specific to Sertoli cells. Notably, we don't observe ATAC-seq signal at peaks specific for Sertoli cells using our ATAC-seq samples. However, we observe robust signals at shared peaks and peaks specific to SPG cells. This observation, strongly argues against the possibility of major contamination by somatic cells. 

      Author response image 9.

      Chromatin accessibility profiles at specific loci differ between SPG cells and other cell types. Genome-browser tracks for Ihh, Wt1, Col16a1 and Zbtb16. For each gene, an extended locus view is presented with RNA-seq data (this study) and normalized ATAC-seq tracks from our study and public sources (SPG Id4; GSE131657; Sertoli; GSM3346484; HSC; ENCFF204JEE). Public ATAC-seq datasets were generated enrichment methods similar to the one employed in our study.

      Author response image 10.

      Shared and cell-type specific ATAC-seq peaks among SPGs, Sertoli and HSC. Up, Normalized ATACseq signal heatmaps of shared and unique ATAC-seq peaks. PND15 and Adult samples are derived from our study. ATAC-seq signal is plotted +/- 500bp from peak center. Bottom, pie charts of ATAC-seq peaks genomic distribution.

      Reviewer #3:

      In this study, Lazar-Contes and colleagues aimed to determine whether chromatin accessibility changes in the spermatogonial population during different phases of postnatal mammalian testis development. Because actions of the spermatogonial population set the foundation for continual and robust spermatogenesis and the gene networks regulating their biology are undefined, the goal of the study has merit. To advance knowledge, the authors used mice as a model and isolated spermatogonia from three different postnatal developmental age points using a cell sorting methodology that was based on cell surface markers reported in previous studies and then performed bulk RNA-sequencing and ATAC-sequencing. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics seem sound but there are several concerns with the cell population isolated from testes and lack of acknowledgment for previous studies that have also performed ATACsequencing on spermatogonia of mouse and human testes. The limitations, described below, call into question the validity of the interpretations and reduce the potential merit of the findings. I suggest changing the acronym for spermatogonial cells from SC to SPG for two reasons. First, SPG is the commonly used acronym in the field of mammalian spermatogenesis. Second, SC is commonly used for Sertoli Cells.

      We thank the reviewer for the suggestion and will rename SCs into SPG cells in the revised manuscript.

      The authors should provide a rationale for why they used postnatal day 8 and 15 mice.

      We will provide a rationale for the use of postnatal 8 and 15 stages in the revised manuscript. Briefly, these stages are interesting to study because early to mid postnatal life is a critical window of development for germ cells during which environmental exposure can have strong and persistent effects. The possibility that changes in germ cells can happen during this period and persist until adulthood is an important area of research linked to disciplines like epigenetic toxicology and epigenetic inheritance.

      The FACS sorting approach used was based on cell surface proteins that are not germline-specific so there were undoubtedly somatic cells in the samples used for both RNA and ATAC sequencing. Thus, it is essential to demonstrate the level of both germ cell and undifferentiated spermatogonial enrichment in the isolated and profiled cell populations. To achieve this, the authors used PLZF as a biomarker of undifferentiated spermatogonia. Although PLZF is indeed expressed by undifferentiated spermatogonia, there have been several studies demonstrating that expression extends into differentiating spermatogonia. In addition, PLZF is not germ-cell specific and single-cell RNA-seq analyses of testicular tissue have revealed that there are somatic cell populations that express Plzf, at least at the mRNA level. For these reasons, I suggest that the authors assess the isolated cell populations using a germ-cell specific biomarker such as DDX4 in combination with PLZF to get a more accurate assessment of the undifferentiated spermatogonial composition. This assessment is essential for the interpretation of the RNA-seq and ATAC-seq data that was generated.

      In agreement with the reviewer’s observation, Zbtb16 (PLZF) is expressed in germ cells but also in somatic cells, in particular in the dataset derived from Green et al., 2018 (Author response image 11). However, when evaluating the expression patterns of Ddx4, we noticed that similar to Zbtb16, it is expressed both in the germ line and in the somatic compartment (Author response image 11). Notably, we observe expression of Ddx4 in SSC but also in progenitors and differentiating SPGs (Author response image 11g). These observations suggest that at least at the transcript level, both genes are transcribed in germ cells and to a lesser degree in somatic cells. 

      Author response image 11.

      Single-cell expression of Ddx4 and Zbtb16. Seurat clusters from all analyzed single-cell RNA-seq datasets (a,c,e,g,i) and feature maps of gene expression for Ddx4 and Zbtb16 (b,d,f,j, h).

      Finally, our deconvolution analysis using geneexpression signature matrices for different cellular populations suggest that our RNA-seq and ATAC-seq libraries are largely derived from SPG cells and in particular of SSCs.

      Furthermore, while this analysis suggested the presence of somatic cells, their proportion is minimal in comparison with germ cells (Author response images 1-4). This is also supported by ATAC-seq analysis of somatic cells from testis (Author response images 9 and 10). 

      A previous study by the Namekawa lab (PMID: 29126117) performed ATAC-seq on a similar cell population (THY1+ FACS sorted) that was isolated from pre-pubertal mouse testes. It was surprising to not see this study referenced in the current manuscript. In addition, it seems prudent to cross-reference the two ATAC-seq datasets for commonalities and differences. In addition, there are several published studies on scATACseq of human spermatogonia that might be of interest to cross-reference with the ATAC-seq data presented in the current study to provide an understanding of translational merit for the findings.

      We compared our ATAC-seq datasets with the ones from (Maezawa et al., 2017) and those from (Cheng et al., 2020). All these datasets were generated from FACSs sorted cells enriched for undifferentiating and differentiating SPGs. Sequencing files from Cheng et al, 2020 were equally processed as described in out methods section, while our pipeline was adjusted to process files from Maezawa et al., 2018 as they were single-end sequencing files. We generated a reference set of peaks from SPGs and calculated signal scores for all peaks across all samples. Then, calculated the Pearson correlation for all pairwise comparisons and generated a heatmap of correlations (Author response image 12). Two clusters emerge that separate the SPG samples from the pachytene spermatocytes and round spermatids reported by Maezawa et al., 2018. As expected SPG samples clustered together based on study of origin. Consistently, our postnatal samples formed one cluster next to but separated from the adult one. Similarly, the id4-bright samples clustered together and next to the id4-sim and the sample applied for the Thy1 and cKit samples. Notably, our samples and the ones from Cheng et al., 2020 have a higher correlation with each other when compared with the ones from Maezawa et al., 2018. Given the fundamental difference in library sequencing (single-end instead of the widely used paired-end for ATAC-seq experiments) we reasoned a comparison with the Maezawa et al., 2018 datasets is not optimal. Therefore, this data in addition to the one presented before (see response to Reviewer 1 and 2) strongly supports a predominantly SPG derivation of all our sequencing libraries. 

      Author response image 12.

      Pearson correlation at the peak level among different ATAC-seq datasets. a) Our ATAC-seq libraries and ATAC-seq libraries from b) Cheng et al., 2020 and c) Maezawa et al., 2020. Thy1-1 and cKit libraries correspond to undifferentiated and differentiating SPGs, respectively. PS, pachytene spermatocytes and RS, round spermatids. Correlation analysis was done using Deeptools.

      References

      Cheng K, Chen I-C, Cheng C-HE, Mutoji K, Hale BJ, Hermann BP, Geyer CB, Oatley JM, McCarrey JR. 2020. Unique Epigenetic Programming Distinguishes Regenerative Spermatogonial Stem Cells in the Developing Mouse Testis. iScience 23:101596. doi:10.1016/j.isci.2020.101596

      Cobos FA, Panah MJN, Epps J, Long X, Man T-K, Chiu H-S, Chomsky E, Kiner E, Krueger MJ, Bernardo D di, Voloch L, Molenaar J, Hooff SR van, Westermann F, Jansky S, Redell ML, Mestdagh P, Sumazin P. 2023. Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes. Genome Biol 24:177. doi:10.1186/s13059-023-03016-6

      Drumond AL, Meistrich ML, Chiarini-Garcia H. 2011. Spermatogonial morphology and kinetics during testis development in mice: a high-resolution light microscopy approach. Reproduction 142:145–155. doi:10.1530/rep-10-0431

      Ernst C, Eling N, Martinez-Jimenez CP, Marioni JC, Odom DT. 2019. Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nat Commun 10:1251. doi:10.1038/s41467-019-09182-1

      Garcia-Moreno SA, Futtner CR, Salamone IM, Gonen N, Lovell-Badge R, Maatouk DM. 2019. Gonadal supporting cells acquire sex-specific chromatin landscapes during mammalian sex determination. Dev Biol 446:168–179. doi:10.1016/j.ydbio.2018.12.023

      Green CD, Ma Q, Manske GL, Shami AN, Zheng X, Marini S, Moritz L, Sultan C, Gurczynski SJ, Moore BB, Tallquist MD, Li JZ, Hammoud SS. 2018. A Comprehensive Roadmap of Murine Spermatogenesis Defined by Single-Cell RNA-Seq. Dev Cell 46:651-667.e10. doi:10.1016/j.devcel.2018.07.025

      Hermann BP, Cheng K, Singh A, Cruz LR-DL, Mutoji KN, Chen I-C, Gildersleeve H, Lehle JD, Mayo M, Westernströer B, Law NC, Oatley MJ, Velte EK, Niedenberger BA, Fritze D, Silber S, Geyer CB, Oatley JM, McCarrey JR. 2018. The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids. Cell Rep 25:1650-1667.e8. doi:10.1016/j.celrep.2018.10.026

      Kubota H, Brinster RL. 2018. Spermatogonial stem cells†. Biol Reprod 99:52–74. doi:10.1093/biolre/ioy077

      Law NC, Oatley MJ, Oatley JM. 2019. Developmental kinetics and transcriptome dynamics of stem cell specification in the spermatogenic lineage. Nat Commun 10:2787. doi:10.1038/s41467-019-10596-0

      Maezawa S, Yukawa M, Alavattam KG, Barski A, Namekawa SH. 2017. Dynamic reorganization of open chromatin underlies diverse transcriptomes during spermatogenesis. Nucleic Acids Res 46:gkx1052-. doi:10.1093/nar/gkx1052

      McCarrey JR. 2013. Toward a More Precise and Informative Nomenclature Describing Fetal and Neonatal Male Germ Cells in Rodents1. Biol Reprod 89:Article 47, 1-9. doi:10.1095/biolreprod.113.110502

      Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D, Diehn M, Alizadeh AA. 2019. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 37:773–782. doi:10.1038/s41587-019-0114-2

      Rabbani M, Zheng X, Manske GL, Vargo A, Shami AN, Li JZ, Hammoud SS. 2022. Decoding the Spermatogenesis Program: New Insights from Transcriptomic Analyses. Annu Rev Genet 56:339–368.

      doi:10.1146/annurev-genet-080320-040045

      Rooij DG de. 2017. The nature and dynamics of spermatogonial stem cells. Development 144:3022–3030. doi:10.1242/dev.146571

      Tan K, Song H-W, Wilkinson MF. 2020. Single-cell RNAseq analysis of testicular germ and somatic cell development during the perinatal period. Development 147:dev183251. doi:10.1242/dev.183251

      Thumfart KM, Lazzeri S, Manuella F, Mansuy IM. 2022. Long-term effects of early postnatal stress on Sertoli cells. Front Genet 13:1024805. doi:10.3389/fgene.2022.1024805

      Xiang G, Keller CA, Heuston EF, Giardine BM, An L, Wixom AQ, Miller A, Cockburn A, Sauria MEG, Weaver K, Lichtenberg J, Göttgens B, Li Q, Bodine D, Mahony S, Taylor J, Blobel GA, Weiss MJ, Cheng Y, Yue F, Hughes J, Higgs DR, Zhang Y, Hardison RC. 2020. An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res 30:gr.255760.119. doi:10.1101/gr.255760.119

    1. eLife assessment

      This study presents a valuable finding on the role of cholesterol-binding site on GLP-1 receptors and functionally characterizes the impact of this mutation on receptor behavior in the membrane and downstream signaling. The computational and experimental approaches used in the study to arrive at the conclusions are solid. The clinical ramifications are unclear at this point, but the study is a helpful addition to the scientific community working on receptor biology and drug development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general the study is convincing, the manuscript well written and the data well presented. Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor. The entire discussion regarding he importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section. The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion. I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Fig. 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 can no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4-induced insulin secretion fold increase under these circumstances, as shown in Fig. 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation appears as working less well. A simple experiment to rule out this possibility would be to test insulin secretion following KCl stimulation under these conditions to determine if maximal stimulation has been reached or not. We will perform this control experiment in the revised manuscript to clarify this point. We will also include absolute insulin results as well as percentages of secretion to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Fig. 1H, I) are relatively variable and we therefore plan to perform further biological repeats of this experiment for the paper revision to consolidate our current findings. 

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1RA responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We will highlight this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We do however agree that it would be interesting to perform further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol, and we will therefore add this experiment to the paper revisions.

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      To clarify the intent of the Laurdan experiments early in the manuscript, we will add the following text to the methods section in the paper revisions: “Laurdan, 6-dodecanoyl-2-dimethylaminonaphthalene (product D250) was purchased from ThermoFisher.  Laurdan (40 μM) was excited using a 405 nm solid state laser and SNAP/FLAG-hGLP-1R labelled with SNAP-Surface Alexa Fluor 647 with a pulsed (80 MHz) super-continuum white light laser at 647 nm. Laurdan emission was recorded in the ranges of 420–460 nm (IB) and 470–510 nm (IR), and the general polarisation (GP) formula (GP = IB-IR/IB+IR) used to retrieve the relative lateral packing order of lipids at the plasma membrane. Values of GP vary from 1 to −1, where higher numbers reflect lower fluidity or higher lateral lipid order, whereas lower numbers indicate increasing fluidity.”

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F. 

      Figs. 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We will add a schematic of this assay to the methods section in the paper revisions to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP-1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, in the paper revisions we will include ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Fig. 1E.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We will include these data and compare islet cholesterol levels after the high cholesterol diet with those of HFD-fed mouse islets in the paper revisions.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, we will perform exendin-4 binding affinity experiments in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells for the paper revisions.

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we will add this worthy point to the discussion and hypothesise on possible effects of the V229A mutation on central GLP-1R effects in the revised manuscript.

    1. eLife assessment

      Kewenig et al. present a timely and valuable study that extends prior research investigating the neural basis of abstract and concrete concepts by examining how these concepts are processed in a naturalistic stimulus: during movie watching. The authors provide convincing evidence that the varying strength of the relationship between a word and a particular visual scene is associated with a change in the similarity between the brain regions active for concrete and abstract words. This work makes a contribution that will be of general interest within any field that faces the inherent challenge of quantifying context in a multimodal stimulus.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate a very interesting but often overlooked aspect of abstract vs. concrete processing in language. Specifically, they study if the differences in processing of abstract vs. concrete concepts in the brain is static or dependent on the (visual) context in which the words occur. This study takes a two-step approach to investigate how context might affect the perception of concepts. First, the authors analyze if concrete concepts, expectedly, activate more sensory systems while abstract concepts activate higher-order processing regions. Second, they measure the contextual situatedness vs. displacement of each word with respect to the visual scenes in which it is spoken and then evaluate if this contextual measure correlates with more activation in the sensory vs. higher-order regions respectively.

      Strengths:

      This study raises a pertinent and understudied question in language neuroscience. It also combines both computational and meta-analytic approaches.

    3. Reviewer #2 (Public review):

      Summary:

      This study tests a plausible and intriguing hypothesis that one cause of the differences in the neural underpinnings of concrete and abstract words is differences in their grounding in the current sensory context. The authors reasoned that, in this case, an abstract word presented with a relevant visual scene would be processed in a more similar way to a concrete word. Typically, abstract and concrete words are tested in isolation. In contrast, this study takes advantage of naturalistic movie stimuli to assess the neural effects of concreteness in both abstract and concrete words (the speech within the film), when the visual context is more or less tied to the word meaning (measured as the similarity between the word co-occurrence-based vector for the spoken word and the average of this vector across all present objects). This novel approach allows a test of the dynamic nature of abstract and concrete word processing, and as such provides a useful perspective accounting for differences in processing these word types.

      The measure of contextual situatedness (how related a spoken word is to the average of the visually presented objects in a scene) is an interesting approach allowing parametric variation within naturalistic stimuli, which is a potential strength of the study. Additionally, the authors use an interesting peak and valley method and provide a rationale for this approach. This provided additional temporal information on the processing of spoken concrete and abstract words.

      The authors predicted that sensory areas would be more active for concrete words, affective areas for abstract and language areas would be involved in both. The use of reverse inference to interpret areas such as the inferior frontal gyrus post hoc, as sensory, affective or language related deserves some caution. It is also important to remember that the different areas identified for each comparison do not necessarily have the same roles. As the number of clusters may therefore be a misleading way to assess the relationship of these areas with the sensory terms, the relationship between each area and the different sensory terms is provided in the supplemental to allow more nuanced interpretation. The study could benefit from being better situated in the prior literature on context and concrete vs abstract regional differences. Overall, the authors successfully demonstrate the context-dependent nature of abstract and concrete word processing.

    4. Reviewer #3 (Public review):

      Summary:

      The primary aim of this manuscript was to investigate how context, defined from visual object information in multimodal movies, impacts the neural representation of concrete and abstract conceptual knowledge. The authors first conduct a series of analyses to identify context independent regional response to concrete and abstract concepts in order to compare these results with the networks observed in prior research using non-naturalistic paradigms. The authors then conduct analyses to investigate whether regional response to abstract and concrete concepts changes when the concepts are either contextually situated or displaced. A concept is considered displaced if the visual information immediately preceding the word is weakly associated with the word whereas a concept is situated if the association is high. The results suggest that, when ignoring context, abstract and concrete concepts engage different brain regions with overlap in core language areas. When context is accounted for, however, similar brain regions are activated for processing concrete and situated abstract concepts and for processing abstract and displaced concrete concepts. The authors suggest that contextual information dynamically changes the brain regions that support the representation of abstract and concrete conceptual knowledge.

      Strengths:

      There is significant interest in understanding both the acquisition and neural representation of abstract and concrete concepts, and most of the work in this area has used highly constrained, decontextualized experimental stimuli and paradigms to do so. This manuscript addresses this limitation by using multimodal narratives which allows for an investigation of how context-sensitive the regional response to abstract and concrete concepts is. The authors characterize the regional response in a comprehensive way.

      Weaknesses:

      The edits made to the manuscript in response to the reviewer comments have clarified and strengthened the methodological concerns flagged by all reviewers, giving me greater confidence that the authors are capturing what they aimed to and are making appropriate inferences given the results.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editorial team for a thoughtful and constructive assessment. We appreciate all comments, and we try our best to respond appropriately to every reviewer’s queries below. It appears to us that one main worry was regarding appropriate modelling of the complex and rich structure of confounding variables in our movie task. 

      One recent approach fits large feature vectors that include confounding variables along the variable(s) of interest to the activity of each voxel in the brain to disentangle the contributions of each variable to the total recorded brain response. While these encoding models have yielded some interesting results, they have two major drawbacks which makes using them unfeasible for our purposes (as we explain in more detail below): first, by fitting large vectors to individual voxels, they tend to over-estimate effect size; second, they are very ineffective at unveiling group-level effects due to high variability between subjects. Another approach able to deal with at least the second of these worries is “inter-subject-correlation”. In this technique brain responses are recorded from multiple subjects while they are presented with natural stimuli. For each brain area, response time courses from different subjects are correlated to determine whether the responses are similar across subjects. Our “peak and valley” analysis is a special case of this analysis technique, as we explain in the manuscript and below. 

      For estimating individual-level brain-activation, we opted for an approach that adapts a classical method of analysing brain data – convolution - to naturalistic settings. Amplitude modulated deconvolution extends classical brain analysis tools in several ways to handle naturalistic data:

      (1) The method does not assume a fixed hemodynamic response function (HRF). Instead, it estimates the HRF over a specified time window from the data, allowing it to vary in amplitude based on the stimulus. This flexibility is crucial for naturalistic stimuli, where the timing and nature of brain responses can vary widely. 

      (2) The method only models the modulation of the amplitude of the HRF above its average with respect to the intensity or characteristics of the stimulus. 

      (3) By allowing variation in the response amplitude, non-linear relationships between the stimulus and brain-response can be captured. 

      It is true that amplitude modulated deconvolution does not come without its flaws – for example including more than a few nuisance regressors becomes computationally very costly. Getting to grips with naturalistic data (especially with fMRI recordings) continuous to be an active area of research and presents a new and exciting challenge. We hope that we can convince reviewers and editors with this response and the additional analyses and controls performed, that the evidence presented for the visual context dependent recruitment of brain areas for abstract and concrete conceptual processing is not incomplete. 

      Overview of Additional Analyses and Controls Performed by the Authors:

      (1) Individual-Level Peaks and Valleys Analysis (Supplementary Material, Figures S3, S4, and S5)

      (2) Test of non-linear correlations of BOLD responses related to features used in the Peak and Valley Analysis (Supplementary Material, Figures S6, S7)

      (3) Comparison of Psycholinguistic Variables Surprisal and Semantic Diversity between groups of words analysed (no significant differences found)  

      (4) Comparison of Visual Variables Optical Flow, Colour Saturation, and Spatial Frequency for 2s Context Window between groups of words analysed (no significant differences found)

      These controls are in addition to the five low-level nuisance regressors included in our model, which are luminance, loudness, duration, word frequency, and speaking rate (calculated as the number of phonemes divided by duration) associated with each analysed word. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Peaks and Valleys Analysis: 

      (1) Doesn't this method assume that the features used to describe each word, like valence or arousal, will be linearly different for the peaks and valleys? What about non-linear interactions between the features and how they might modulate the response? 

      Within-subject variability in BOLD response delays is typically about 1 second at most (Neumann et al., 2003). As individual words are presented briefly (a few hundred Ms at most) and the BOLD response to these stimuli falls within that window (1s/TR), any nonlinear interactions between word features and a participant’s BOLD response within that window are unlikely to significantly affect the detection of peaks and valleys.

      To quantitatively address the concern that non-linear modulations could manifest outside of that window, we include a new analysis in Figure S6, which compares the average BOLD responses of each participant in each cluster and each combination of features, showing that only a very few of all possible comparisons differ significantly from each other (~ 5000 combinations of features were significantly different from each other given an overall number of ~130.000 comparisons between BOLD responses to features, which amounts to 3.85%), suggesting that there are no relevant non-linear interactions between features. For a full list of the most non-linearly interacting features see Figure S7. 

      (2) Doesn't it also assume that the response to a word is infinitesimal and not spread across time? How does the chosen time window of analysis interact with the HRF? From the main figures and Figures S2-S3 there seem to be differences based on the timelag. 

      The Peak and Valley (P&V) method does not assume that the response to a word is infinitesimal or confined to an instantaneous moment. The units of analysis (words) fall within one TR, as they are at most hundreds of Ms long – for this reason, we are looking at one TR only. The response of each voxel at that TR will be influenced by the word of interest, as well as all other words that have been uttered within the 1s TR, and the multimodal features of the video stimulus that fall within that timeframe. So, in our P&V, we are not looking for an instantaneous response but rather changes in the BOLD signal that correspond to the presence of linguistic features within the stimuli. 

      The chosen time window of analysis interacts with the human response function (HRF) in the following way: the HRF unfolds over several seconds, typically peaking around 5-6 seconds after stimulus onset and returning to baseline within 20-30 seconds (Handwerker et al., 2004).

      Our P&V is designed to match these dynamics of fMRI data with the timing of word stimuli. We apply different lags (4s, 5s, and 6s) to account for the delayed nature of the HRF, ensuring that we capture the brain's response to the stimuli as it unfolds over time, rather than assuming an immediate or infinitesimal effect. We find that the P&V yields our expected results for a 5s and a 6s lag, but not a 4s lag. This is in line with literature suggesting that the HRF for a given stimulus peaks around 5-6s after stimulus onset (Handwerker et al., 2004). As we are looking at very short stimuli (a few hundred ms) it makes sense that the distribution of features would significantly change with different lags. The fact that we find converging results for both a 5s and 6s lag, suggests that the delay is somewhere between 5s and 6s. There is no way of testing this hypothesis with the resolution of our brain data, however (1 TR). 

      (3) Were the group-averaged responses used for this analysis? 

      Yes, the response for each cluster was averaged across participants. We now report a participant-level overview of the Peak and Valley analysis (lagged at 5s) with similar results as the main analysis in the supplementary material see Figures S3, S4, and S5.

      (4) Why don't the other terms identified in Figure 5 show any correspondence to the expected categories? What does this mean? Can the authors also situate their results with respect to prior findings as well as visualize how stable these results are at the individual voxel or participant level? It would also be useful to visualize example time courses that demonstrate the peaks and valleys. 

      The terms identified in figure 5 are sensorimotor and affective features from the combined Lancaster and Brysbaert norms. As for the main P&V analysis, we only recorded a cluster as processing a given feature (or term) when there were significantly more instances of words highly rated in that dimension occurring at peaks rather than valleys in the HRF. For some features/terms, there were never significantly more words highly rated on that dimension occurring at peaks compared to valleys, which is why some terms identified in figure 5 do not show any significant clusters.  We have now also clarified this in the figure caption. 

      We situate the method in previous literature in lines 289 – 296. In essence, it is a variant of the well-known method called “reverse correlation” first detailed in Hasson et al., 2004 (reference from the manuscript) and later adapter to a peak and valley analysis in Skipper et al., 2009 (reference from the manuscript). 

      We now present a more fine-grained characterisation of each cluster on an individual participant level in the supplementary material. We doubt that it would be useful to present an actual example time-course as it would only represent a fraction of over one hundred thousand analysed time-series. We do already present an exemplary time-course to demonstrate the method in Figure 1. 

      Estimating contextual situatedness: 

      (1) Doesn't this limit the analyses to "visual" contexts only? And more so, frequently recognized visual objects? 

      Yes, it was the point of this analysis to focus on visual context only, and it may be true that conducting the analysis in this way results in limiting it to objects that are frequently recognized by visual convolutional neural networks. However, the state-of-the-art strength of visual CNNs in recognising many different types of objects has been attested in several ways (He et al., 2015). Therefore, it is unlikely that the use of CNNs would bias the analysis towards any specific “frequently recognised” objects. 

      (2) The measure of situatedness is the cosine similarity of GloVe vectors that depend on word co-occurrence while the vectors themselves represent objects isolated by the visual recognition models. Expectedly, "science" and the label "book" or "animal" and the label "dog" will be close. But can the authors provide examples of context displacement? I wonder if this just picks up on instances where the identified object in the scene is unrelated to the word. How do the authors ensure that it is a displacement of context as opposed to the two words just being unrelated? This also has a consequence on deciding the temporal cutoff for consideration (2 seconds). 

      The cosine similarity is between the GloVe vectors of the word (that is situated or displaced) and the words referring to the objects identified by the visual recognition model. Therefore, the correlation is between more than just two vectors and both correlated representations depend on co-occurrence. The cosine similarity value reported is not from a comparison between GloVe vectors and vectors that are (visual) representations of objects from the visual recognition model. 

      A word is displaced if all the identified object-words in the defined context window (2s before word-onset) are unrelated to the word (_see lines 105-110 (pg. 5); lines 371-380 pg. 1516 and Figure 2 caption). Thus, a word is considered to be displaced if _all identified objects (not just two as claimed by the reviewer) in the scene are unrelated to the word. Given a context of 60 frames and an average of 5 identified objects per frame (i.e. an average candidate set of 300 objects that could be related) per word, the bar for “displacement” is set high. We provide some further considerations justifying the context window below in our responses to reviewers 2 and 3. 

      (3) While the introduction motivated the problem of context situatedness purely linguistically, the actual methods look at the relationship between recognized objects in the visual scene and the words. Can word surprisal or another language-based metric be used in place of the visual labeling? Also, it is not clear how the process identified in (2) above would come up with a high situatedness score for abstract concepts like "truth". 

      We disagree with the reviewer that the introduction motivated the problem of context situatedness purely linguistically, as we explicitly consider visual context in the abstract as well as the introduction. Examples in text include lines 71-74 and lines 105-115. This is also reflected in the cited studies that use visual context, including Kalenine et al., 2014; Hoffmann et al., 2013; Yee & Thompson-Schill, 2016; Hsu et al., 2011. However, we appreciate the importance of being very clear about this point, so we added various mentions of this fact at the beginning of the introduction to avoid confusion.

      We know that prior linguistic context (e.g. measured by surprisal) does affect processing. The point of the analysis was to use a non-language-based metric of visual context to understand how this affects conceptual representation in naturalist settings. Therefore, it is not clear to us why replacing this with a language-based metric such as surprisal would be an adequate substitution. However, the reviewer is correct that we did not control for the influence of prior context. We obtained surprisal values for each of our words but could not find any significant differences between conditions and therefore did not include this factor in the analyses conducted.  For considerations of differences in surprisal between each of the analysed sets of words, see the supplementary material.  

      The method would yield a high score of contextual situatedness for abstract concepts if there were objects in the scene whose GloVe embeddings have a close cosine distance to the GloVe embedding of that abstract word (e.g., “truth” and “book”). We believe this comment from the reviewer is rooted in a misconception of our method. They seem to think we compared GloVe vectors for the spoken word with vectors from a visual recognition model directly (in which case it is true that there would be a concern about how an abstract concept like “truth” could have a high situatedness). Apart from the fact that there would be concerns about the comparability of vectors derived from GloVe and a visual recognition model more generally, this present concern is unwarranted in our case, as we are comparing GloVe embeddings.  

      (4) It is a bit hard to see the overlapping regions in Figures 6A-C. Would it be possible to show pairs instead of triples? Like "abstract across context" vs. "abstract displaced"? Without that, and given (2) above, the results are not yet clear. Moreover, what happens in the "overlapping" regions of Figure 3? 

      To make this clearer, we introduced the contrasts (abstract situated vs displaced and concrete situated vs displaced) that were previously in the supplementary materials in the main text (now Figure 6, this was also requested by reviewer 2). We now show the overlap between the abstract situated (from the contrast in Figure 6) with concrete across context and the overlap between concrete displaced (from the contrast in Figure 6) with abstract across context separately in Figure 7. 

      The overlapping regions of Figure 3 indicate that both concrete and abstract concepts are processed in these regions (though at different time-points). We explain why this is a result of our deconvolution analysis on page 23:  

      “Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame. In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus.”

      Miscellaneous comments: 

      (1) In Figure 3, it is surprising that the "concrete-only" regions dominate the angular gyrus and we see an overrepresentation of this category over "abstract-only". Can the authors place their findings in the context of other studies? 

      The Angular Gyrus (AG) is hypothesised to be a general semantic hub; therefore it is not surprising that it should be active for general conceptual processing (and there is some overlap activation in posterior regions). We now situate our results in a wider range of previous findings in the results section under “Conceptual Processing Across Context”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely, activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decision-making, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently time-consuming and reflective of the extended processing time for abstract concepts (Thompson-Schill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      The finding that concrete concepts activate more brain voxels compared to abstract concepts is generally aligned with existing research, which often reports more extensive brain activation for concrete versus abstract words. This is primarily due to the richer sensory and perceptual associations tied to concrete concepts - see for example Binder et al., 2005 (figure 2 in the paper). Similarly, a recent meta-analysis by Bucur & Pagano (2021) consistently found wider activation networks for the “concrete > abstract” contrast compared to the “abstract > concrete contrast”.   

      (2) The following line (Pg 21) regarding the necessary differences in time for the two categories was not clear. How does this fall out from the analysis method? 

      - Both categories overlap **(though necessarily at different time points)** in regions typically associated with word processing - 

      This is answered in our response above to point (4) in the reviewer’s comments. We now also provide more information on the temporal differences in the supplementary material (Figure S9). 

      Reviewer #2 (Public Review):

      The critical contrasts needed to test the key hypothesis are not presented or not presented in full within the core text. To test whether abstract processing changes when in a situated context, the situated abstract condition would first need to be compared with the displaced abstract condition as in Supplementary Figure 6. Then to test whether this change makes the result closer to the processing of concrete words, this result should be compared to the concrete result. The correlations shown in Figure 6 in the main text are not focused on the differences in activity between the situated and displaced words or comparing the correlation of these two conditions with the other (concrete/abstract) condition. As such they cannot provide conclusive evidence as to whether the context is changing the processing of concrete/abstract words to be closer to the other condition. Additionally, it should be considered whether any effects reflect the current visual processing only or more general sensory processing. 

      The reviewer identifies the critical contrast as follows:

      “The situated abstract condition would first need to be contrasted with the displaced abstract condition. Then, these results should be compared to the concrete result.” 

      We can confirm that this is indeed what had been done and we believe the reviewer’s confusion stems from a lack of clarity on our behalf. We have now made various clarifications on this point in the manuscript, and we changed the figures to make clear that our results are indeed based on the contrasts identified by this reviewer as the essential ones.

      Figure 6 in the main text now reflects the contrast between situated and displaced abstract and concrete conditions (as requested by the reviewer, this was previously Figure S7 from the supplementary material). To compare the results from this contrast to conceptual processing across context, we use cosine similarity, and we mention these results in the text. We furthermore show the overlap between the conditions of interest (abstract situated x concrete across context; concrete displaced x abstract across context) in a new figure (Figure 7) to bring out the spatial distribution of overlap more clearly.

      We also discussed the extent to which these effects reflect current visual processing only or more general sensory processing in lines 863 – 875 (pg. 33 and 34).   

      “In considering the impact of visual context on the neural encoding of concepts generally, it is furthermore essential to recognize that the mechanisms observed may extend beyond visual processing to encompass more general sensory processing mechanisms. The human brain is adept at integrating information across sensory modalities to form coherent conceptual representations, a process that is critical for navigating the multimodal nature of real-world experiences (Barsalou, 2008; Smith & Kosslyn, 2007). While our findings highlight the role of visual context in modulating the neural representation of abstract and concrete words, similar effects may be observed in contexts that engage other sensory modalities. For instance, auditory contexts that provide relevant sound cues for certain concepts could potentially influence their neural representation in a manner akin to the visual contexts examined in this study. Future research could explore how different sensory contexts, individually or in combination, contribute to the dynamic neural encoding of concepts, further elucidating the multimodal foundation of semantic processing.”

      Overall, the study would benefit from being situated in the literature more, including a) a more general understanding of the areas involved in semantic processing (including areas proposed to be involved across different sensory modalities and for verbal and nonverbal stimuli), and b) other differences between abstract and concrete words and whether they can explain the current findings, including other psycholinguistic variables which could be included in the model and the concept of semantic diversity (Hoffman et al.,). It would also be useful to consider whether difficulty effects (or processing effort) could explain some of the regional differences between abstract and concrete words (e.g., the language areas may simply require more of the same processing not more linguistic processing due to their greater reliance on word co-occurrence). Similarly, the findings are not considered in relation to prior comparisons of abstract and concrete words at the level of specific brain regions. 

      We now present an overview of the areas involved in semantic processing (across different sensory modalities for verbal and nonverbal stimuli) when we first present our results (section: “Conceptual Processing Across Context”).

      We looked at surprisal as a potential cofound and found no significant differences between any of the set of words analysed. Mean surprisal of concrete words is 22.19, mean surprisal of abstract words is 21.86. Mean surprisal ratings for concrete situated words are 21.98 bits, 22.02 bits for the displaced concrete words, 22.10 for the situated abstract words and 22.25 for the abstract displaced words. We also calculated the semantic diversity of all sets of words and found now significant differences between the sets. The mean values for each condition are: abstract_high (2.02); abstract_low (1.95); concrete_high (1.88); concrete_low (2.19); abstract_original (1.96); concrete_original (1.92). Hence processing effort related to different predictability (surprisal), or greater semantic diversity cannot explain our findings. 

      We submit that difficulty effects do not explain any aspects of the activation found for conceptual processing, because we included word frequency in our model as a nuisance regressor and found no significant differences associated with surprisal. Previous work shows that surprisal (Hale, 2001) and word frequency (Brysbaert & New, 2009) are good controls for processing difficulty.

      Finally, we added considerations of prior findings comparing abstract and concrete words at the level of specific brain regions to the discussion (section: Conceptual Processing Across Context). 

      The authors use multiple methods to provide a post hoc interpretation of the areas identified as more involved in concrete, abstract, or both (at different times) words. These are designed to reduce the interpretation bias and improve interpretation, yet they may not successfully do so. These methods do give some evidence that sensory areas are more involved in concrete word processing. However, they are still open to interpretation bias as it is not clear whether all the evidence is consistent with the hypotheses or if this is the best interpretation of individual regions' involvement. This is because the hypotheses are provided at the level of 'sensory' and 'language' areas without further clarification and areas and terms found are simply interpreted as fitting these definitions. For instance, the right IFG is interpreted as a motor area, and therefore sensory as predicted, and the term 'autobiographical memory' is argued to be interoceptive. Language is associated with the 'both' cluster, not the abstract cluster, when abstract >concrete is expected to engage language more. The areas identified for both vs. abstract>concrete are distinguished in the Discussion through the description as semantic vs. language areas, but it is not clear how these are different or defined. Auditory areas appear to be included in the sensory prediction at times and not at others. When they are excluded, the rationale for this is not given. Overall, it is not clear whether all these areas and terms are expected and support the hypotheses. It should be possible to specify specific sensory areas where concrete and abstract words are predicted to be different based on a) prior comparisons and/or b) the known locations of sensory areas. Similarly, language or semantic areas could be identified using masks from NeuroSynth or traditional metaanalyses.  A language network is presented in Supplementary Figure 7 but not interpreted, and its source is not given. 

      “The language network” was extracted through neurosynth and projected onto the “overlap” activation map with AFNI. We now specify this in the figure caption. 

      Alternatively, there could be a greater interpretation of different possible explanations of the regions found with a more comprehensive assessment of the literature. The function of individual regions and the explanation of why many of these areas are interpreted as sensory or language areas are only considered in the Discussion when it could inform whether the hypotheses have been evidenced in the results section. 

      We added extended considerations of this to the results (as requested by the reviewer) in the section “Conceptual Processing Across Contexts”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely,  activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decisionmaking, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer timeframe (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently timeconsuming and reflective of the extended processing time for abstract concepts (ThompsonSchill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      Additionally, these methods attempt to interpret all the clusters found for each contrast in the same way when they may have different roles (e.g., relate to different senses). This is a particular issue for the peaks and valleys method which assesses whether a significantly larger number of clusters is associated with each sensory term for the abstract, concrete, or both conditions than the other conditions. The number of clusters does not seem to be the right measure to compare. Clusters differ in size so the number of clusters does not represent the area within the brain well. Nor is it clear that many brain regions should respond to each sensory term, and not just one per term (whether that is V1 or the entire occipital lobe, for instance). The number of clusters is therefore somewhat arbitrary. This is further complicated by the assessment across 20 time points and the inclusion of the 'both' categories. It would seem more appropriate to see whether each abstract and concrete cluster could be associated with each different sensory term and then summarise these findings rather than assess the number of abstract or concrete clusters found for each independent sensory term. In general, the rationale for the methods used should be provided (including the peak and valley method instead of other possible options e.g., linear regression). 

      We included an assessment of whether each abstract and concrete cluster could be associated with each different sensory term and then summarised these findings on a participant level in the supplementary material (Figures S3, S4, and S5). 

      Rationales for the Amplitude Modulated Deconvolution are now provided on page 10 (specifically the first paragraph under “Deconvolution Analysis” in the Methods section) and for the P&V on pages 13, 14 and 15 (under “Peaks and Valley” (particularly the first paragraph) in the Methods section). 

      The measure of contextual situatedness (how related a spoken word is to the average of the visually presented objects in a scene) is an interesting approach that allows parametric variation within naturalistic stimuli, which is a potential strength of the study. This measure appears to vary little between objects that are present (e.g., animal or room), and those that are strongly (e.g., monitor) or weakly related (e.g., science). Additional information validating this measure may be useful, as would consideration of the range of values and whether the split between situated (c > 0.6) and displaced words (c < 0.4) is sufficient.  

      The main validation of our measure of contextual situatedness derives from the high accuracy and reliability of CNNs in object detection and recognition tasks, as demonstrated in numerous benchmarks and real-world applications. 

      One reason for low variability in our measure of contextual situatedness is the fact that we compared the GloVe vector of each word of interest with an average GloVe vector of all object-words referring to objects present in 56 frames (~300 objects on average). This means that a lot of variability in similarity measures between individual object-words and the word of interest is averaged out. Notwithstanding the resulting low variability of our measure, we thought that this would be the more conservative approach, as even small differences between individual measures (e.g. 0.4 vs 0.6) would constitute a strong difference on average (across the 300 objects per context window).  Therefore, this split ensures a sufficient distinction between words that are strongly related to their visual context and those that are not – which in turn allows us to properly investigate the impact of contextual relevance on conceptual processing.

      Finally, the study assessed the relation of spoken concrete or abstract words to brain activity at different time points. The visual scene was always assessed using the 2 seconds before the word, while the neural effects of the word were assessed every second after the presentation for 20 seconds. This could be a strength of the study, however almost no temporal information was provided. The clusters shown have different timings, but this information is not presented in any way. Giving more temporal information in the results could help to both validate this approach and show when these areas are involved in abstract or concrete word processing. 

      We provide more information on the temporal differences of when clusters are involved in processing concrete and abstract concepts in the supplementary material (Figure S9) and refer to this information where relevant in the Methods and Results sections. 

      Additionally, no rationale was given for this long timeframe which is far greater than the time needed to process the word, and long after the presence of the visual context assessed (and therefore ignores the present visual context). 

      The 20-second timeframe for our deconvolution analysis is justified by several considerations. Firstly, the hemodynamic response function (HRF) is known to vary both across individuals and within different regions of the brain. To accommodate this variability and capture the full breadth of the HRF, including its rise, peak, and return to baseline, a longer timeframe is often necessary. The 20-second window ensures that we do not prematurely truncate the HRF, which could lead to inaccurate estimations of neural activity related to the processing of words. Secondly and related to this point, unlike model-based approaches that assume a canonical HRF shape, our deconvolution analysis does not impose a predefined form on the HRF, instead reconstructing the HRF from the data itself – for this, a longer time-frame is advantageous to get a better estimation of the true HRF. Finally, and related to this point, the use of the 'Csplin' function in our analysis provides a flexible set of basis functions for deconvolution, allowing for a more fine-grained and precise estimation of the HRF across this extended timeframe. The 'Csplin' function offers more interpolation between time points, which is particularly advantageous for capturing the nuances of the HRF as it unfolds over a longer time-frame. 

      Although we use a 20-second timeframe for the deconvolution analysis to capture the full HRF, the analysis is still time-locked to the onset of each visual stimulus. This ensures that the initial stages of the HRF are directly tied to the moment the word is presented, thus incorporating the immediate visual context. We furthermore include variables that represent aspects of the visual context at the time of word presentation in our models (e.g luminance) and control for motion (optical flow), colour saturation and spatial frequency of immediate visual context. 

      Reviewer #3 (Public Review):

      The context measure is interesting, but I'm not convinced that it's capturing what the authors intended. In analysing the neural response to a single word, the authors are presuming that they have isolated the window in which that concept is processed and the observed activation corresponds to the neural representation of that word given the prior context. I question to what extent this assumption holds true in a narrative when co-articulation blurs the boundaries between words and when rapid context integration is occurring. 

      We appreciate the reviewer's critical perspective on the contextual measure employed in our study. We agree that the dynamic and continuous nature of narrative comprehension poses challenges for isolating the neural response to individual words. However, the use of an amplitude modulated deconvolution analysis, particularly with the CSPLIN function, is a methodological choice to specifically address these challenges. Deconvolution allows us to estimate the hemodynamic response function (HRF) without assuming its canonical shape, capturing nuances in the BOLD signal that may reflect the integration of rapid contextual shifts (only beyond the average modulation of the BOLD signal. The CSPLIN function further refines this approach by offering a flexible basis set for modelling the HRF and by providing a detailed temporal resolution that can adapt to the variance in individual responses. 

      Our choice of a 20-second window is informed by the need to encompass not just the immediate response to a word but also the extended integration of the contextual information. This is consistent with evidence indicating that the brain integrates information over longer timescales when processing language in context (Hasson et al., 2015). The neural representation of a word is not a static snapshot but a dynamic process that evolves with the unfolding narrative. 

      Further, the authors define context based on the preceding visual information. I'm not sure that this is a strong manipulation of the narrative context, although I agree that it captures some of the local context. It is maybe not surprising that if a word, abstract or concrete, has a strong association with the preceding visual information then activation in the occipital cortex is observed. I also wonder if the effects being captured have less to do with concrete and abstract concepts and more to do with the specific context the displaced condition captures during a multimodal viewing paradigm. If the visual information is less related to the verbal content, the viewer might process those narrative moments differently regardless of whether the subsequent word is concrete or abstract. I think the claims could be tailored to focus less generally on context and more specifically on how visually presented objects, which contribute to the ongoing context of a multimodal narrative, influence the subsequent processing of abstract and concrete concepts.

      The context measure, though admittedly a simplification, is designed to capture the local visual context preceding word presentation. By using high-confidence visual recognition models, we ensure that the visual information is reliably extracted and reflects objects that have a strong likelihood of influencing the processing of subsequent words. We acknowledge that this does not capture the full richness of narrative context; however, it provides a quantifiable and consistent measure of the immediate visual environment, which is an important aspect of context in naturalistic language comprehension.

      With regards to the effects observed in the occipital cortex, we posit that while some activation might be attributable to the visual features of the narrative, our findings also reflect the influence of these features on conceptual processing. This is especially because our analysis only looks at the modulation of the HRF amplitude beyond the average response (so also beyond the average visual response) when contrasting between conditions of high and low visual-contextual association with important (audio-visual) control variables included in the model. 

      Lastly, we concur that both concrete and abstract words are processed within a multimodal narrative, which could influence their neural representation. We believe our approach captures a meaningful aspect of this processing, and we have refined our claims to specify the influence of visually presented objects on the processing of abstract and concrete concepts, rather than making broader assertions about multimodal context. We also highlight several other signals (e.g. auditory) that could influence processing. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The approach taken here requires a lot of manual variable selection and seems a bit roundabout. Why not build an encoding model that can predict the BOLD time course of each voxel in a participant from the feature-of-interest like valence etc. and then analyze if (1) certain features better predict activity in a specific region (2) the predicted responses/regression parameters are more positive (peaks) or more negative (valleys) for certain features in a specific brain region (3) maybe even use contextual features use a large language model and then per word (like "truth") analyze where the predicted responses diverge based on the associated context. This seems like a simpler approach than having multiple stages of analysis. 

      It is not clear to us why an encoding model would be more suitable for answering the question at hand (especially given that we tried to clarify concerns about non-linear relationships between variables). On the contrary, fitting a regression model to each individual voxel has several drawbacks. First, encoding models are prone to over-estimate effect sizes (Naselaris et al., 2011). Second, encoding models are not good at explaining group-level effects due to high variability between individual participants (Turner et al., 2018). We would also like to point out that an encoding model using features of a text-based LLM would not address the visual context question - unless the LLM was multimodal. Multimodal LLMs are a very recent research development in Artificial Intelligence, however, and models like LLaMA (adapter), Google’s Gemini, etc. are not truly multimodal in the sense that would be useful for this study, because they are first trained on text and later injected with visual data. This relates to our concern that the reviewer may have misunderstood that we are interested in purely visual context of words (not linguistic context).

      (2) In multiple analyses, a subset of the selected words is sampled to create a balanced set between the abstract and concrete categories. Do the authors show standard deviation across these sets? 

      For the subset of words used in the context-based analyses, we give mean ratings of concreteness, log frequency and length and conduct a t-test to show that these variables are not significantly different between the sets. We also included the psycholinguistic control variables surprisal and semantic diversity, as well as the visual variables motion (optical flow), colour saturation and spatial frequency.  

      Reviewer #2 (Recommendations For The Authors):

      Figures S3-5 are central to the argument and should be in the main text (potentially combined).  

      These have been added to the main text

      S5 says the top 3 terms are DMN (and not semantic control), but the text suggests the r value is higher for 'semantic control' than 'DMN'? 

      Fixed this in the text, the caption now reads: 

      “This was confirmed by using the neurosynth decoder on the unthresholded brain image - top keywords were “Semantic Control” and “DMN”.”

      Fig. S7 is very hard to see due to the use of grey on grey. Not used for great effect in the final sentence, but should be used to help interpret areas in the results section (if useful). It has not been specified how the 'language network' has been identified/defined here. 

      We altered the contrast in the figure to make boundaries more visible and specified how the language network was identified in the figure caption. 

      In the Results 'This showed that concrete produced more modulation than abstract modulation in the frontal lobes,' should be parts of /some of the frontal lobes as this isn't true overall. 

      Fixed this in the text.  

      There are some grammatical errors and lack of clarity in the context comparison section of the results. 

      Fixed these in the text.

      Reviewer #3 (Recommendations For The Authors):

      •  The analysis code should be shared on the github page prior to peer review.  

      The code is now shared under: https://github.com/ViktorKewenig/Naturalistic_Encoding_Concepts

      •  At several points throughout the methods section, information was referred to that had not yet been described. Reordering the presentation of this information would greatly improve interpretability. A couple of examples of this are provided below. 

      Deconvolution Analysis: the use of amplitude modulation regression was introduced prior to a discussion of using the TENT function to estimate the shape of the HRF. This was then followed by a discussion of the general benefits of amplitude modulation. Only after these paragraphs are the modulators/model structure described. Moving this information to the beginning of the section would make the analysis clearer from the onset. 

      Fixed this in the text

      Peak and Valley Analysis: the hypotheses regarding the sensory-motor features and experiential features are provided prior to describing how these features were extracted from the data (e.g., using the Lancaster norms). 

      Fixed this in the text.

      •  The justification for and description of the IRF approach seems overdone considering the timing differences are not analyzed further or discussed. 

      We now present a further discussion of timing differences in the supplementary material.

      •  The need and suitability of the cluster simulation method as implemented were not clear. The resulting maps were thresholded at 9 different p values and then combined, and an arbitrary cluster threshold of 20 voxels was then applied. Why not use the standard approach of selecting the significance threshold and corresponding cluster size threshold from the ClustSim table? 

      We extracted the original clusters at 9 different p values with the corresponding cluster size from the ClustSim table, then only included clusters that were bigger than 20 voxels.  

      •  Why was the center of mass used instead of the peak voxel? 

      Peak voxel analysis can be sensitive to noise and may not reliably represent the region's activation pattern, especially in naturalistic imaging data where signal fluctuations are more variable and outliers more frequent. The centre of mass provides a more stable and representative measure of the underlying neural activity. Another reason for using the center of mass is that it better represents the anatomical distribution of the data, especially in large clusters with more than 100 voxels where peak voxels are often located at the periphery. 

      • Figure 1 seems to reference a different Figure 1 that shows the abstract, concrete, and overlap clusters of activity (currently Figure 3). 

      Fixed this in the text.

      • Table S1 seems to have the "Touch" dimension repeated twice with different statistics reported. 

      Fixed this in the text, the second mention of the dimension “touch” was wrong.  

      • It appears from the supplemental files that the Peaks and Valley analysis produces different results at different lag times. This might be expected but it's not clear why the results presented in the main text were chosen over those in the supplemental materials. 

      The results in the main text were chosen over those in the supplementary material, because the HRF is said to peak at 5s after stimulus onset. We added a specification of this rational to the “2. Peak and Valley Analysis” subsection in the Methods section.  

      References (in order of appearance) 

      (1) Neumann J, Lohmann G, Zysset S, von Cramon DY. Within-subject variability of BOLD response dynamics. Neuroimage. 2003 Jul;19(3):784-96. doi: 10.1016/s10538119(03)00177-0. PMID: 12880807.

      (2) Handwerker DA, Ollinger JM, D'Esposito M. Variation of BOLD hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage. 2004 Apr;21(4):1639-51. doi: 10.1016/j.neuroimage.2003.11.029. PMID: 15050587.

      (3) Binder JR, Westbury CF, McKiernan KA, Possing ET, Medler DA. Distinct brain systems for processing concrete and abstract concepts. J Cogn Neurosci. 2005 Jun;17(6):90517. doi: 10.1162/0898929054021102. PMID: 16021798

      (4) Bucur, M., Papagno, C. An ALE meta-analytical review of the neural correlates of abstract and concrete words. Sci Rep 11, 15727 (2021). heps://doi.org/10.1038/s41598-021-94506-9 

      (5) Hale., J. 2001. A probabilistic earley parser as a psycholinguistic model. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies (NAACL '01). Association for Computational Linguistics, USA, 1–8. heps://doi.org/10.3115/1073336.1073357

      (6) Brysbaert, M., New, B. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41, 977–990 (2009). heps://doi.org/10.3758/BRM.41.4.977 

      (7) Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). Intersubject Synchronization of Cortical Activity During Natural Vision. Science, 303(5664), 6.

      (8) Naselaris T, Kay KN, Nishimoto S, Gallant JL. Encoding and decoding in fMRI. Neuroimage. 2011 May 15;56(2):400-10. doi: 10.1016/j.neuroimage.2010.07.073. Epub 2010 Aug 4. PMID: 20691790; PMCID: PMC3037423.

      (9) Turner BO, Paul EJ, Miller MB, Barbey AK. Small sample sizes reduce the replicability of task-based fMRI studies. Commun Biol. 2018 Jun 7;1:62. doi: 10.1038/s42003-0180073-z. PMID: 30271944; PMCID: PMC6123695.

      (10) He, K., Zhang, Y., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. Bioarchive (Tech Report). heps://doi.org/heps://doi.org/10.48550/arXiv.1512.03385

      (11) Hasson, U., & Egidi, G. (2015). What are naturalistic comprehension paradigms teaching us about language? In R. M. Willems (Ed.), Cognitive neuroscience of natural language use (pp. 228–255). Cambridge University Press. heps://doi.org/10.1017/CBO9781107323667.011

    1. eLife assessment

      This important work, leveraging state-of-the-art whole-night sleep EEG-fMRI methods, advances our understanding of the brain states underlying sleep and wakefulness. Despite a small sample size, the authors present convincing evidence for substates within N2 and REM sleep stages, with reliable transition structure, supporting the perspective that there are more than the five canonical sleep/wake states.

    2. Reviewer #1 (Public review):

      Summary:

      The study made fundamental findings in investigations of the dynamic functional states during sleep. Twenty-one HMM states were revealed from the fMRI data, surpassing the number of EEG-defined sleep stages, which can define sub-states of N2 and REM. Importantly, these findings were reproducible over two nights, shedding new light on the dynamics of brain function during sleep.

      Strengths:

      The study provides the most compelling evidence on the sub-states of both REM and N2 sleep. Moreover, they showed these findings on dynamics states and their transitions were reproducible over two nights of sleep. These novel findings offered unique information in the field of sleep neuroimaging.

      Comments on revised version:

      Nice work! All my concerns have been addressed, and I have no further suggestions.

    3. Reviewer #2 (Public review):

      Summary:

      Yang and colleagues used a Hidden Markov Model (HMM) on whole-night fMRI to isolate sleep and wake brain states in a data-driven fashion. They identify more brain states (21) than the five sleep/wake stages described in conventional PSG-based sleep staging, show that the identified brain states are stable across nights, and characterize the brain states in terms of which networks they primarily engage.

      Strengths:

      This work's primary strengths are its dataset of two nights of whole-night concurrent EEG-fMRI (including REM sleep), and its sound methodology.

      Weaknesses:

      Weaknesses are its small sample size, and limited attempts at relating the identified fMRI brain states back to EEG.

      General appraisal:

      The paper's conclusions are generally well-supported, but additional analyses could improve the work further.<br /> The authors' main focus lies in identifying fMRI-based brain states, and they succeed at demonstrating both the presence and robustness of these states in terms of cross-night stability. Additional characterization of brain states in terms of which networks these brain states primarily engage adds additional insights.

      A missed opportunity remains the absence of more analyses relating the HMM states back to EEG. While the authors show how power in different EEG bands varies with HMM state (Supplementary Figures 10 and 11) it would be much more informative to show the complete EEG spectra for each of the 21 HMM states, organized by PSG-based sleep/wake state. This would enable answering how EEG spectra of, say, different N2-related HMM states compare. Similarly, it is presently unclear whether anything noticeable happens within the EEG timecourse at the moment of an HMM class switch (particularly when the PSG stage remains stable). Such analyses might have shown that fMRI-based brain states map onto familiar EEG substates, or reveal novel EEG changes that have so far gone unnoticed. Furthermore, if band-specific analyses are to be performed, care should be taken to specify bands in accordance with the dominant sleep EEG features (e.g., slow oscillation and sigma/spindle bands are currently missing).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study made fundamental findings in investigations of the dynamic functional states during sleep. Twenty-one HMM states were revealed from the fMRI data, surpassing the number of EEG-defined sleep stages, which can define sub-states of N2 and REM. Importantly, these findings were reproducible over two nights, shedding new light on the dynamics of brain function during sleep.

      Strengths:

      The study provides the most compelling evidence on the sub-states of both REM and N2 sleep. Moreover, they showed these findings on dynamics states and their transitions were reproducible over two nights of sleep. These novel findings offered unique information in the field of sleep neuroimaging.

      Weaknesses:

      The only weakness of this study has been acknowledged by the authors: limited sample size.

      We thank the reviewer for the overall enthusiasm for this study.

      Reviewer #1 (Recommendations For The Authors):

      (1) Were there differences in the extent of head motion during sleep among sleep stages? How was the potential motion parameter differences handled during the statistical analyses?

      If there were large head motions that continued for a long time (e.g., longer than 1 minute), how did the authors deal with that scanning session? For an extremely long scanning session (3 hours), how was motion correction conducted? It would be great if the authors could provide more details.

      We found that N3 sleep stage had lowest head motion, followed by REM, N2, N1, and lastly Wake. In other words, participants have lower head motion during sleep than during Wakefulness. We added this information to the Supplemental Results, copied below.

      We performed standardized motion correction during preprocessing using AFNI regardless of the duration of the scans. We did not include motion parameters in the HMM model. Time frames with Excessive head motion (any of 6 head motion parameters exceeding 0.3 mm or degree) was censored. Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019).

      In Supplemental Results, “Motion parameters with sleep stages.

      Averaged motion across six motion parameters decreased from wake to light sleep to deep sleep at night 2. For example, mean (standard deviation) motion for each sleep stage is as follows, N1: 0.043 (0.37); N2: 0.039 (0.033); N3: 0.035 (0.031); REM: 0.035 (0.032); Wake: 0.057 (0.052).

      Similarly, the percentage of timepoints retained after censoring decreased from wake to light sleep to deep sleep at night 2. N1: 91%; N2: 93%; N3: 96%; REM: 89%; Wake 90%.”

      In the method section, “Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019). We also found that motion is lower during deep sleep compared to wake, see Supplemental Results.”

      (2) It is possible that the data input for the HMM analyses might vary among participants and between the two nights, how did the authors deal with this issue during statistical analyses?

      This is a great question. We standardized BOLD timecourses for each participant and each night to avoid differences among participants and between two nights. We revised the description in the method section to make this point clear.

      In the method section, “To prepare the data for analysis, we first standardized the participant-specific sets of 300 ROI timecourses (scaled to a mean of 0, and a standard deviation of 1), which were then concatenated across all participants. This standardization was performed separately for each night. ”

      (3) Figures 2 and 4, the top part seems to be missing, e.g., "Night 2" in Figure 2, and "N2-related" in Figure 4.

      Thank you for pointing out these errors. We fixed them.

      (4) Figure 3 seems to be more stretched vertically than horizontally.

      We revised the figure to ensure it appears balanced on both sides.

      Reviewer #2 (Public Review):

      Summary:

      Yang and colleagues used a Hidden Markov Model (HMM) on whole-night fMRI to isolate sleep and wake brain states in a data-driven fashion. They identify more brain states (21) than the five sleep/wake stages described in conventional PSG-based sleep staging, show that the identified brain states are stable across nights, and characterize the brain states in terms of which networks they primarily engage.

      Strengths:

      This work's primary strengths are its dataset of two nights of whole-night concurrent EEG-fMRI (including REM sleep), and its sound methodology.

      Weaknesses:

      The study's weaknesses are its small sample size and the limited attempts at relating the identified fMRI brain states back to EEG.

      We thank the reviewer for the positive feedback and helpful suggestions for this study.

      General appraisal:

      The paper's conclusions are generally well-supported, but some additional analyses and discussions could improve the work.

      The authors' main focus lies in identifying fMRI-based brain states, and they succeed at demonstrating both the presence and robustness of these states in terms of cross-night stability. Additional characterization of brain states in terms of which networks these brain states primarily engage adds additional insights.

      A somewhat missed opportunity is the absence of more analyses relating the HMM states back to EEG. It would be very helpful to the sleep field to see how EEG spectra of, say, different N2-related HMM states compare. Similarly, it is presently unclear whether anything noticeable happens within the EEG time course at the moment of an HMM class switch (particularly when the PSG stage remains stable). While the authors did look at slow wave density and various physiological signals in different HMM states, a characterization of the EEG itself in terms of spectral features is missing. Such analyses might have shown that fMRI-based brain states map onto familiar EEG substates, or reveal novel EEG changes that have so far gone unnoticed.

      We thank the reviewer for this great suggestion. We performed EEG spectral analysis on each HMM state. Results were added to Suppementary Results and Supplementary Figure 10 and 11 (Copied below). Specifically, we confirmed that N3-related states had highest Delta power and that the Deep-N2 module showed different spectral profiles compared to Light-N2 module.

      In Supplemental Results: “We conducted spectral analysis for each TR and calculated the average power spectrum for each common EEG brainwave—Delta (0.5-4 Hz), Theta (4-8 Hz), Alpha (8-13 Hz), Beta (13-30 Hz), and Gamma (30-100 Hz)—across the 21 HMM states. See Supplementary Figure 10 and 11 for night 2 and night 1 data, respectively. As expected, we found that N3-related states 8 and 10 had highest Delta power in both nights. In addition, the Deep-N2 module had higher power in Theta and Alpha bands compared to the Light-N2 module.”

      It is unclear how the presently identified HMM brain states relate to the previously identified NREM and wake states by Stevner et al. (2019), who used a roughly similar approach. This is important, as similar brain states across studies would suggest reproducibility, whereas large discrepancies could indicate a large dependence on particular methods and/or the sample (also see later point regarding generalizability).

      This is a great question. There are some similarities and differences between the current study and Stevner et al. (2019). We discussed this in the Supplementary Discussion. Copied below.

      In the Supplementary Discussion: “Both studies demonstrated that HMM states can be effectively divided into meaningful modules solely based on transition probabilities. Furthermore, both studies indicated that pre-sleep wakefulness differs from post-sleep wakefulness.

      However, despite the similar approaches used, key differences in data acquisition and analysis make it challenging to directly compare HMM states between these two studies. Firstly, Stevner et al. (2019) collected only 1-hour-long sleep data from 18 participants, whereas our current study includes 8-hour-long sleep data from 12 participants for two consecutive nights. As discussed in the main text, full sleep cycling cannot be obtained from 1-hour long sleep due to the lack of REM stage and incomplete sleep cycles. Secondly, in Stevner et al. (2019) (Figure 4e), the four wake-NREM stages had roughly the same duration. In contrast, in our current study (Night 2, Figure 2A), the N2 stage comprises 43% of total sleep, which aligns with the natural N2 composition of nocturnal sleep stages. This discrepancy might explain the different number of N2-related states found in the two studies, with 3 out of 19 in Stevner et al. (2019) versus 13 out of 21 in our current study.”

      More justice could be done to previous EEG-based efforts moving beyond conventional AASM-defined sleep/wake states. Various EEG studies performed data-driven clustering of brain states, typically indicating more than 5 traditional brain states (e.g., Koch et al. 2014, Christensen et al. 2019, Decat. et al 2022). Beyond that, countless subdivisions of classical sleep stages have been proposed (e.g., phasic/tonic REM, N2 with/without spindles, N3 with global/local slow waves, cyclic alternating patterns, and many more). While these aren't incorporated into standard sleep stage classification, the current manuscript could be misinterpreted to suggest that improved/data-driven classifications cannot be achieved from EEG, which is incorrect.

      We agree with the reviewer that previous EEG-based efforts should be mentioned. We now added this in the manuscript. Copied below.

      In the Discussion section, “Third, we chose to not include EEG features in our data-driven model. However, the current method is not limited to fMRI data and can be applied to EEG data. Given that previous data-driven studies based on EEG data have suggested that there might be more than five traditional sleep stages (Christensen et al., 2019; Decat et al., 2022; Koch et al., 2014), as well as subdivisions within these traditional sleep stages (Brandenberger et al., 2005; Decat et al., 2022; Simor et al., 2020), future studies may apply data-driven models on both fMRI and EEG data. ”

      More discussion of the limitations of the current sample and generalizability would be helpful. A sample of N=12 is no doubt impressive for two nights of concurrent whole-night EEG-fMRI. Still, any data-driven approach can only capture the brain states that are present in the sample, and 12 individuals are unlikely to express all brain states present in the population of young healthy individuals. Add to that all the potentially different or altered brain states that come with healthy ageing, other demographic variables, and numerous clinical disorders. How do the authors expect their results to change with larger samples and/or varying these factors? Perhaps most importantly, I think it's important to mention that the particular number of identified brain states (here 21, and e.g. 19 in Stevner) is not set in stone and will likely vary as a function of many sample- and methods-related factors.

      We thank the reviewer for the great suggestions. We now included these points when discussing limitations in the Discussion section. We think that a HMM model with larger sample size might produce more fine-grained results, but this remains to be investigated when a more extensive dataset becomes available.

      In the Discussion section, “Secondly, while our study involved a relatively small number of participants (12), it included a large amount of fMRI data (~16 hours scan) per participant. Although the HMM trained on data from 12 participants was robust, the generalizability of the current results to different populations—such as healthy aging individuals and clinical populations—needs to be demonstrated in future studies, particularly with larger sample sizes and more diverse populations.”

      “Fourth, while we selected 21 HMM brain sleep states based on model evaluation parameters in the current study, the exact number of sleep states is not fixed and likely depends on various sample- and methods-related factors, such as sample size and model setups.”

    1. eLife assessment

      The methods and findings of the current work are important and well-grounded. The strength of the evidence presented is convincing and backed up by rigorous methodology. The work, when elaborated on how to access the app, will have far-reaching implications for current clinical practice.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors aimed to develop and validate an automated, deep learning-based system for scoring the Rey-Osterrieth Complex Figure Test (ROCF), a widely used tool in neuropsychology for assessing memory deficits. Their goal was to overcome the limitations of manual scoring, such as subjectivity and time consumption, by creating a model that provides automatic, accurate, objective, and efficient assessments of memory deterioration in individuals with various neurological and psychiatric conditions.

      Strengths:

      Comprehensive Data Collection: The authors collected over 20,000 hand-drawn ROCF images from a wide demographic and geographic range, ensuring a robust and diverse dataset. This extensive data collection is critical for training a generalizable and effective deep learning model.

      Advanced Deep Learning Approach: Utilizing a multi-head convolutional neural network to automate ROCF scoring represents a sophisticated application of current AI technologies. This approach allows for detailed analysis of individual figure elements, potentially increasing the accuracy and reliability of assessments.

      Validation and Performance Assessment: The model's performance was rigorously evaluated against crowdsourced human intelligence and professional clinician scores, demonstrating its ability to outperform both groups. The inclusion of an independent prospective validation study further strengthens the credibility of the results.

      Robustness Analysis Efficacy: The model underwent a thorough robustness analysis, testing its adaptability to variations in rotation, perspective, brightness, and contrast. Such meticulous examination ensures the model's consistent performance across different clinical imaging scenarios, significantly bolstering its utility for real-world applications.

      Appraisal and discussion:

      By leveraging a comprehensive dataset and employing advanced deep learning techniques, they demonstrated the model's ability to outperform both crowdsourced raters and professional clinicians in scoring the ROCF. This achievement represents a significant step forward in automating neuropsychological assessments, potentially revolutionizing how memory deficits are evaluated in clinical settings. Furthermore, the application of deep learning to clinical neuropsychology opens avenues for future research, including the potential automation of other neuropsychological tests and the integration of AI tools into clinical practice. The success of this project may encourage further exploration into how AI can be leveraged to improve diagnostic accuracy and efficiency in healthcare.

      However, the critique regarding the lack of detailed analysis across different patient demographics, the inadequacy of network explainability, and concerns about the selection of median crowdsourced scores as ground truth raises questions about the completeness of their objectives. These aspects suggest that while the aims were achieved to a considerable extent, there are areas of improvement that could make the results more robust and the conclusions stronger.

      Comments on revised version:

      I appreciate the opportunity to review this revised submission. Having considered the other reviews, I believe this study presents an important advance in using AI methods for clinical applications, which is both innovative and has implications beyond a single subfield.

      The authors have developed a system using fundamental AI that appears sufficient for clinical use in scoring the Rey-Osterrieth Complex Figure (ROCF) test. In human neuropsychology, tests that generate scores like this are a key part of assessing patients. The evidence supporting the validity of the AI scoring system is compelling. This represents a valuable step towards evaluating more complex neurobehavioral functions.

      However, one area where the study could be strengthened is in the explainability of the AI methods used. To ensure the scores are fully transparent and consistent for clinical use, it will be important for future work to test the robustness of the approach, potentially by comparing multiple methods. Examining other latent variables that can explain patients' cognitive functioning would also be informative.

      In summary, I believe this study provides an important proof-of-concept with compelling evidence, while also highlighting key areas for further development as this technology moves towards real-world clinical applications.

    3. Reviewer #2 (Public Review):

      The authors aimed to develop and validate a machine-learning driven neural network capable of automatic scoring of the Rey-Osterrieth Complex Figure. They aimed to further assess the robustness of the model to various parameters such as tilt and perspective shift in real drawings. The authors leveraged the use of a huge sample of lay workers in scoring figures and also a large sample of trained clinicians to score a subsample of figures. Overall, the authors found their model to have exceptional accuracy and perform similarly to crowdsourced workers and clinicians with, in some cases, less degree of error/score dispersion than clinicians.

    4. Reviewer #3 (Public Review):

      This study presented a valuable inventory of scoring a neuropsychological test, ROCFT, with constructing an artificial intelligence model.

      Comments on latest version:

      The authors made the system with fundamental AI that is sufficient for clinical use for humans. In human neuropsychology, the test that generates the score is fundamental and relatively easy. Neuropsychologists apply patients to many tests; therefore, the present system is one of them, where we cannot tell the total neurofunction of a patient. The evidence for scoring is thought to be compelling quality, enough for clinical use now and we progress to evaluate other more complicated human neuropsychological functions. For example, persons with dementia change their performance easily when they feel other emotions (worry, boredom, etc. ) and notice other stimulation (announcements in the hospital, a walking nurse by chance, etc.). The score of ROCF is definitely changing, compelling the effort of AI scoring. We should grasp this behavior of humans with diverse tests totally. Therefore, scoring AI with compelling quality is a fundamental step for the next, evaluation against the changeable and ambiguous neurobehavior of humans.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Comment #1: Insufficient Network Analysis for Explainability: The paper does not sufficiently delve into network analysis to determine whether the model's predictions are based on accurately identifying and matching the 18 items of the ROCF or if they rely on global, item-irrelevant features. This gap in analysis limits our understanding of the model's decision-making process and its clinical relevance.

      Response #1: Thank you for your comment. We acknowledge the importance of understanding the decision-making process of AI models is crucial for their acceptance and utility in clinical settings. However, we believe that our current approach, which focuses on providing individual scores for each of the 18 items of the Rey-Osterrieth Complex Figure (ROCF), inherently offers a higher level of explainability and practical utility for clinicians than a network analysis could. Our multi-head convolutional neural network is designed with a dedicated output head for each of the 18 items in the ROCF, and thus provides separate scores for each of the 18 items in the ROCF. This architecture helps that the model focuses on individual elements rather than relying on global, item-irrelevant features.

      This item-specific approach directly aligns with the traditional clinical assessment method, thereby making the results more interpretable and actionable for clinicians. The individual scores for each item provide detailed insights into a patient's performance. Clinicians can use these scores to identify specific areas of strength and weakness in a patient's visuospatial memory and drawing abilities.

      Furthermore, we evaluated the model's performance on each of the 18 items separately, providing detailed metrics that show consistent accuracy across all items. This item-level performance analysis offers clear evidence that the model is not relying on irrelevant global features but is indeed making decisions based on the specific characteristics of each item. We believe that our approach provides a level of explainability that is directly useful and relevant to clinical practitioners.

      Comment #2: Generative Model Consideration: The critique suggests exploring generative models to model the joint distribution of images and scores, which could offer deeper insights into the relationship between scores and specific visual-spatial disabilities. The absence of this consideration in the study is seen as a missed opportunity to enhance the model's explainability and clinical utility.

      Response #2: Thank you for your thoughtful comment and the suggestion to explore generative models. We appreciate the potential benefits that generative models to model the joint distribution of images and scores. However, we chose not to pursue this approach in our study for several reasons: First, our primary goal was to develop a model that provides accurate and interpretable scores for each of the 18 individual items in the ROCF figure. Second, generative models, while powerful, would add a layer of complexity that might diminish the clarity and immediate clinical applicability of our results. Generative models, (particularly deep learning-based) can be challenging to interpret in terms of how they make decisions or why they produce specific outputs. This lack can be a concern in critical applications involving neurological and psychiatric disorders. Clinicians require tools that provide clear insights without the need for additional layers of analysis. Our current model provides detailed, item-specific scores that clinicians can directly use to assess visuospatial memory and drawing abilities. Initially, we explored using generative models (i.e. GANs) for data augmentation to address the scarcity of low-score images compared to high-score images. Moreover, for the low-score images, the same score can be achieved by numerous combinations of figure elements. However, due to our extensive available dataset, we did not observe any substantial performance improvements in our model. Nevertheless, future studies could explore generative models, such as Variational Autoencoders (VAEs) or Bayesian Networks, and test them on the data from the current prospective study to compare their performance with our results.

      In the revised manuscript, we have included additional sentences discussing the potential use of generative models and their implications for future research.

      “The data augmentation did not include generative models. Initially, we explored using generative models, specifically GANs, for data augmentation to address the scarcity of low-score images compared to high-score images. However, due to the extensive available dataset, we did not observe any substantial performance improvements in our model. Nevertheless, Future studies could explore generative models, such as Variational Autoencoders (VAEs) or Bayesian Networks, which can then be tested on the data from the current prospective study and compared with our results.”

      Comment #3: Lack of Detailed Model Performance Analysis Across Subject Conditions: The study does not provide a detailed analysis of the model's performance across different ages, health conditions, etc. This omission raises questions about the model's applicability to diverse patient populations and whether separate models are needed for different subject types.

      Response #3: Thank you for your this important comment. Although the initial version of our manuscript already provided detailed “item-specific” and “across total scores” performance metrics, we recognize the importance of including detailed analyses across different patient demographics to enhance the robustness and applicability of our findings. In response to your comment, we have conducted additional analyses that provide a comprehensive evaluation of model performance across various patient demographics, such as age groups, gender, and different neurological and psychiatric conditions. This additional analysis demonstrates the generalizability and reliability of our model across diverse populations. We have included these analyses in the revised manuscript.

      “In addition, we have conducted a comprehensive model performance analysis to evaluate our model's performance across different ROCF conditions (copy and recall), demographics (age, gender), and clinical statuses (healthy individuals and patients) (Figure 4A). These results have been confirmed in the prospective validation study (Supplementary Figure S6). Furthermore, we included an additional analysis focusing on specific diagnoses to assess the model's performance in diverse patient populations (Figure 4B). Our findings demonstrate that the model maintains high accuracy and generalizes well across various demographics and clinical conditions.”

      Comment #4: Data Augmentation: While the data augmentation procedure is noted as clever, it does not fully encompass all affine transformations, potentially limiting the model's robustness.

      Response #4: We appreciate your feedback on our data augmentation strategy. We acknowledge that while our current approach significantly improves robustness against certain semantic transformations, it may not fully cover all possible affine transformations.

      Here, we provide further clarification and justification for our chosen methods and their impact on the model's performance: In our study, we implemented a data augmentation pipeline to enhance the robustness of our model against common and realisitc geometric and semantic-preserving transformations. This pipeline included rotations, perspective changes, and Gaussian blur, which we found to be particularly effective in improving the model's resilience to variations in input data. These transformations are particularly relevant for the present application since users in real-life are likely to take pictures of drawings that might be slightly rotated or with a slightly tilted perspective. With these intuitions in mind, we randomly transformed drawings during training. Each transformation was a combination of Gaussian blur, a random perspective change, and a rotation with angles chosen randomly between -10° and 10°. These transformations are representative of realistic scenarios where images might be slightly tilted or photographed from different angles. We intentionally did not explicitly address all affine transformations, such as shearing or more complex geometric transformations because these transformations could alter the score of individual items of the ROCF and would be disruptive to the model.

      As noted in our manuscript and demonstrated in supplementary Figure S1, the data augmentation pipeline significantly improved the model's robustness against rotations and changes in perspective. Importantly, our tablet-based scoring application can further ensure that the photos taken do not exhibit excessive semantic transformations. By leveraging the gyroscope built into the tablet, the application can help users align the images properly, minimizing issues such as excessive rotation or skew. This built-in functionality helps maintain the quality and consistency of the images, reducing the likelihood of significant semantic transformations that could affect model performance.

      Comment #5: Additionally, the rationale for using median crowdsourced scores as ground truth, despite evidence of potential bias compared to clinician scores, is not adequately justified.

      Response #5: Thank you for this valuable comment. Clarifying the rationale behind using the median score of crowdsourcing as the ground truth is indeed important. To reach high accuracy in predicting individual sample scores of the ROCFs, it is imperative that the scores of the training set are based on a systematic scheme with as little human bias as possible influencing the score. However, our analysis (see results section) and previous work (Canham et al., 2000) suggested that the scoring conducted by clinicians may not be consistent, because the clinicians may be unwittingly influenced by the interaction with the patient/participant or by the clinicians factor (e.g. motivation and fatigue). For this reason and the incomplete availability of clinician scores for all figures (i.e. for 19% of the 20’225 figures), we did not use the clinicians scores as ground truth scores. Instead, we have trained a large pool (5000 workers) of human internet workers (crowdsourcing) to score ROCFs drawings guided by our self-developed interactive web application. Each element of the figure was scored by several human workers (13 workers on average per figure). We have obtained the ground truth for each drawing by computing the median for each item in the figure, and then summed up the medians to get the total score for the drawing in question. To further ensure high-quality data annotation, we identified and excluded crowdsourcing participants that have a high level of disagreement (>20% disagreement) with this rating from trained clinicians, who carefully scored manually a subset of the data in the same interactive web application.

      We chose the median score for several reasons: (1) the median score is less influenced by outliers compared to the mean. Given the variability of scoring between different clinicians and human workers (see human MSE and clinician MSE), using the median ensures that the ground truth is not skewed by extreme values, leading to more stable and reliable scores. (2) Crowdsource data do not always follow a normal distribution. In cases where the distribution is skewed or not symmetric, the median can be a more representative measure of the center. (3) The original scoring system involves ordinal scales (0,0.5,1,2). For ordinal scales, the median is often more appropriate than the mean. Finally, by aggregating multiple scores from a large pool of crowdsourced raters, the median provides a consensus that reflects the most common assessment. This approach mitigates the variability introduced by individual rater biases and ensures a more consistent ground truth. In clinical settings, the consensus of multiple expert opinions often serves as the benchmark for assessments. The use of median scores mirrors this practice, providing a ground truth that is representative of collective human judgment.

      Canham, R. O., S. L. Smith, and A. M. Tyrrell. 2000. “Automated Scoring of a Neuropsychological Test:

      The Rey Osterrieth Complex Figure.” Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future. https://doi.org/10.1109/eurmic.2000.874519.

      Reviewer #2:

      Comment #1: There is no detail on how the final scoring app can be accessed and whether it is medical device-regulated.

      Response #1: We appreciate the opportunity to provide more information about the current status and plans for our scoring application. At this stage, the final scoring app is not publicly accessible as it is currently undergoing rigorous beta testing with a select group of clinicians in real-world settings. The feedback from these clinicians is instrumental in refining the app’s features, interface, and overall functionality to improve its usability and user experience. This ensures that the app meets the high standards required for clinical tools. Following the successful completion of the beta testing phase, we aim to seek FDA approval for the scoring app. Achieving this regulatory milestone will guarantee that the app meets the stringent requirements for medical devices, providing an additional layer of confidence in its safety and efficacy for clinical use. Once FDA approval is obtained, we plan to make the app publicly accessible to clinicians and healthcare institutions worldwide. Detailed instructions on how to access and use the app will be provided at that time on our website (https://www.psychology.uzh.ch/en/areas/nec/plafor/research/rfp.html).

      Comment #2: No discussion on the difference in sample sizes between the pre-registration of the prospective study and the results (e.g., aimed for 500 neurological patients but reported data from 288). Demographics for the assessment of the representation of healthy and non-healthy participants were not present.

      Response #2: Thank you for your comment. We believe there might have been a misunderstanding regarding our preregistration details. In the preregistration, we planned to prospectively acquire ROCF drawings from 1000 healthy subjects. Each subject should have drawn two ROCF drawings (copy and memory condition). Consequently, 2000 samples should have been collected. In addition, in our pre-registration plan, we aimed to collect 500 drawings from patients (i.e. 250 patients), not 500 patients as the reviewer suggested (https://osf.io/82796). Thus in total, the goal was to obtain 2500 ROCF figures. The final prospective data set, which contained 2498 ROCF images from 961 healthy adults and 288 patients very closely matches the sample size, we aimed for in the the pre-registration. We do not see a necessity to discuss this negligible discrepancy in the main manuscript. The prospective data set remains substantial and sufficient to test our model on the independent prospective data set. Importantly, we want to highlight that the test set in the retrospective data set (4045 figures) was also never seen by the model. Both the retrospective and prospective data sets demonstrate substantial global diversity as the data has been collected in 90 different countries. Please note, that Supplementary Figures S2 & S3 provide detailed demographics of the participants in the prospectively collected data, present their performance in the copy and (immediate) recall condition across the lifespan, and the worldwide distribution of the origin of the data.

      Comment #3: Supplementary Figure S1 & S4 is poor quality, please increase resolution.

      Response #3: We apologize for the poor quality of Supplementary Figures S1 and S4 in the initial submission. In the revised version of our submission, we have increased the resolution of both Supplementary Figure S1 and Supplementary Figure S4 to ensure that all details are clearly visible and the figures are of high quality.

      Comment #4: Regarding medical device regulation; if the app is to be used in clinical practice (as it generates a score and classification of performance), I believe such regulation is necessary - but there are ways around it. This should be detailed.

      Response #4: We agree that regulation is essential for any application intended for use in clinical practice, particularly one that generates scores and classifications of performance. As discussed in response #1, the final scoring application is currently undergoing intensive beta testing in real-world settings with a limited group of clinicians and is therefore not publicly accessible at this time. We are fully committed to obtaining the necessary regulatory approvals before the app is made publicly accessible for clinical use. Once the beta testing phase is complete and the app has been refined based on clinician feedback, we will prepare and submit a comprehensive regulatory dossier. This submission will include all necessary data on the app's development, testing, validation, and clinical utility. We are adhering to relevant regulatory standards and guidelines, such as ISO 13485 for medical devices and the FDA's guidance on software as a medical device (SaMD).

      Comment #7: Need to clarify that work was already done and pre-printed in 2022 for the main part of this study, and that this paper contributes to an additional prospective study.

      Response #7: We would like to clarify that the pre-print the reviewer is referring to is indeed the current paper submitted to ELife. The submitted paper includes both the work that was pre-printed in 2022 and the additional prospective study, as you correctly identified.

      Reviewer #3:

      Comment #1: The considerable effort and cost to make the model only for an existing neuropsychological test.

      Response #1: We acknowledge that significant effort and resources were dedicated to developing our model for the Rey-Osterrieth Complex Figure (ROCF) test. Below, we provide a detailed rationale for this investment and the broader implications of our work. The ROCF test is one of the most widely used neuropsychological assessments worldwide, providing critical insights into visuospatial memory and executive function. While the initial effort and cost are substantial, the long-term benefits of an automated, reliable, objective, fast and widely applicable neuropsychological assessment tool justify the investment. The scoring application will significantly reduce the time for scoring the test and thus provide more efficient use of clinical resources, and the potential for broader applications makes this a worthwhile endeavor. The methods and infrastructure developed for this model can be adapted and scaled to other neuropsychological tests and assessments (e.g. Taylor Figure).

      Comment #2: I was truly impressed by the authors' establishment of a system that organizes the methods and fields of diverse specialties in such a remarkable way. I know the primary purpose of ROCFT. However, beyond the score, neuropsychologically, these are observed by specialists while ROCFT and that is attractive of the test: the turn of each stroke (e.g., from right to left, from the main structure to the margin or small structure), the process to total completeness as a figure, e.g., confidential speed and concentration, the boldness of strokes, unnatural fragmentation of strokes, the not deviated place in a paper, turning of the figure itself (before the scanning level), the total size, the level compared with the age, education, and experiences of the patient. Those are reflected by the disease, visuospatial intelligence, executive function, and ability to concentrate. Scores are crucial, but by observing the drawing process, we can obtain diverse facts or parts of symptoms that imply the complications of human behavior.

      Response #2: Thank you for your insightful comments and observations regarding our system for organizing diverse specialties within the ROCFT methodology. We agree that beyond the numerical scores, the detailed observation of the drawing process provides invaluable neuropsychological insights. How strokes are executed, from their direction and placement to the overall completion process, offers a nuanced understanding of factors like spatial orientation, concentration, and executive function. In fact, we are working on a ROCF pen tracking application, which enables the patient to draw the ROCF with a digital pen on a tablet. The tablet can 1) assess the sequence order of drawing the items and the number of strokes, 2) record the exact coordinate of each drawn pixel at each time point of the assessment, 3) measure the duration for each pen stroke as well as total drawing time, and 4) assess the pen stroke pressure. Through this, we aim to extract additional information on processing speed, concentration, and other cognitive domains. However, this development is outside the scope of the current manuscript.

    1. eLife assessment

      This is an important paper that reports in vivo physiological abnormalities in the hippocampus of a rat model of traumatic brain injury (TBI). In this study, authors focused on changes in theta-gamma phase coupling and action potential entrainment to theta, phenomena hypothesized to be critical for cognition. While the authors provide solid evidence of deficits in both features post-TBI, the study would have been stronger with a more hypothesis-driven approach and consideration of alterations of the animal's behavioral state or sensorimotor deficits beyond memory processes.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated how traumatic brain injury affects oscillatory and single-unit hippocampal activity in awake-behaving rats.

      Strengths:

      The use of high-density laminar electrodes enabled precise localization of recording sites. To ensure an unbiased, rigorous approach, single-unit analysis was performed by a reviewer who was blind to experimental conditions. A proof of concept study was undertaken to characterize the pathology that resulted from the specific TBI model used in the main study. There was an effort to link abnormalities in hippocampal activity to memory disruption by running a cohort of rats on the Morris Water Maze task.

      Weaknesses:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion. The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported. It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments. There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units. Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate changes in theta-gamma phase amplitude coupling, and action potential entrainment to theta following traumatic brain injury (TBI). Both phenomena are widely hypothesized to be important for cognition, and the authors report deficits in both after TBI. The manuscript is well-written, the figures are well-constructed, and the author's use of high-level analysis methods for TBI EEG data collected from awake, behaving animals is welcome.

      Major Comments:

      - The animal n's are small (4 sham and 5 injured). In Figure 3, for instance, one wonders if panels D and E might have shown significant differences if more animals had been recorded.

      - The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors studied the effects of traumatic brain injury created by LFPI procedure on the CA1 at the network level. The major findings in this study seem to be that the TBI reduces theta and gamma powers in CA1, reduces phase-amplitude coupling in between theta and gamma bands as well as disrupts the gamma entrainment of interneurons. I think the authors have made some important discoveries that could help advance the understanding of TBI effects at the physiological level, however, more investigations into deciphering the relationship of the behavioral and brain states to the observed effects would help clarify the interpretations for the readers.

      Strengths:

      The authors in this study were able to combine behavioral verification of the TBI model with the laminar electrophysiological recordings of the CA1 region to bring forward network-level anomalies such as the temporal coordination of network-level oscillations as well as in the firing of the interneurons. Indeed, it seems that the findings may serve future studies to functionally better understand and/or refine the therapies for the TBI.

      Weaknesses:

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensory-motor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

    5. Author response:

      We would like to thank the editors and reviewers for their constructive feedback, and we look forward to addressing their comments in the revised manuscript. We also appreciate the acknowledgment that the use of laminar electrodes in awake-behaving animals is an important advancement for the TBI community, and that our results provide a potential physiological link between coalescing TBI pathologies and cognitive deficits. We believe that integrating the reviewer comments will help to make our analyses even more rigorous and will improve the overall manuscript. Please find comments related to specific concerns raised in the public review below:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion… It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments.

      Previous mechanistic and physiological studies suggested interneuronal dysfunction following TBI that we hypothesized would disrupt oscillatory dynamics underlying temporal coding (single unit entrainment to theta/gamma, phase precession, and phase-amplitude coupling). These are known to support hippocampal-dependent learning and memory tasks such as the Morris Water Maze. While we did not record during a goal-directed behavioral task, the goal of recording in a familiar and novel environment was to assess remapping across these environments. Unfortunately, occupancy in the two environments was not high enough to rigorously characterize place cell specificity and phase precession or and investigate remapping, although putative place cells were identified. Despite this shortcoming, we were still able to confirm that the spike timing of interneurons relative to hippocampal oscillations was disrupted which we believe underlies the massive reduction in theta-gamma phase amplitude coupling reported. This opens the door to more strongly hypothesis-driven, mechanistic studies (i.e. closed loop stimulation) to alter the spike timing of interneurons relative to theta phase and potentially rescue these effects on phase amplitude coupling and behavior.

      The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported… There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units.

      The number of rats used for the spatial working memory task was reported in the text and Figure legend where the statistics were reported, but we will ensure that the statistics are more completely reported by including relevant statistical results and parameters outside of the test used and p-value. Additionally, we will report the number of units recorded per animal.

      Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group

      The spatial working memory deficit that we report in the Morris Water Maze is not a novel finding and has been demonstrated numerous times in this TBI model. Our goal in including this was to increase the rigor of the study by verifying this deficit in our hands at the injury level used for these physiology experiments. The dissociation between spatial working memory deficits and other motor, motivational, or sensory deficits from TBI in the Morris Water Maze (e.g. swim speed and escape latency with visible platforms) has been well characterized in this TBI model at many injury levels including more severe injuries than those used in this study. We will address this in the Discussion as it is an important point.

      The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

      We agree that there is a broadband downward shift in power following TBI especially in the pyramidal cell layer. We will include a normalization of the power spectra in order to specifically compare the theta and gamma bands between sham and injured rats and include discussion about the broadband decrease.

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensory-motor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

      We agree that changes in hippocampal physiology that we report could arise due to disrupted inputs from TBI, and this study is inherently limited due to recording exclusively from CA1. We chose to record from the hippocampus due to its importance for learning and memory, and its vulnerability in TBI. Future studies will investigate how hippocampal afferents are affected by injury, and we hope that the layer-specific changes we report will help to inform which inputs may be preferentially disrupted. Importantly, these inputs along with local processing within the hippocampus change drastically depending on the behavior of the animal. We will more rigorously assess movement and the behavioral state of the rats when comparing physiological properties, especially the firing rates reported in Figure 3.

    1. eLife assessment

      In this valuable study, the authors use deep learning models to provide solid evidence that epithelial wounding triggers bursts of cell division at a characteristic distance away from the wound. The documentation provided by the authors should allow other scientists to readily apply these methods, which are particularly appropriate where unsupervised machine-learning algorithms have difficulties.

    2. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep learning models to analyse the dynamics of epithelia. In this way they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strengths:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is compelling.

      The methods presented in this work should prove to be very helpful for quantifying cell proliferation in epithelial tissues.

      We thank the reviewer for the positive comments!

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      Comments on revised version:

      Regarding the Reviewer's 1 comment on the architecture details, I have now understood that the precise architecture (number/type of layers, activation functions, pooling operations, skip connections, upsampling choice...) might have remained relatively hidden to the authors themselves, as the U-net is built automatically by the fast.ai library from a given classical choice of encoder architecture (ResNet34 and ResNet101 here) to generate the decoder part and skip connections.

      Regarding the Major point 1, I raised the question of the generalisation potential of the method. I do not think, for instance, that the optimal number of frames to use, nor the optimal choice of their time-shift with respect to the division time (t-n, t+m) (not systematically studied here) may be generic hyperparameters that can be directly transferred to another setting. This implies that the method proposed will necessarily require re-labeling, re-training and re-optimizing the hyperparameters which directly influence the network architecture for each new dataset imaged differently. This limits the generalisation of the method to other datasets, and this may be seen as in contrast to other tools developed in the field for other tasks such as cellpose for segmentation, which has proven a true potential for generalisation on various data modalities. I was hoping that the authors would try themselves testing the robustness of their method by re-imaging the same tissue with slightly different acquisition rate for instance, to give more weight to their work.

      We thank the referee for the comments. Regarding this particular biological system, due to photobleaching over long imaging periods (and the availability of imaging systems during the project), we would have difficulty imaging at much higher rates than the 2 minute time frame we currently use. These limitations are true for many such systems, and it is rarely possible to rapidly image for long periods of time in real experiments. Given this upper limit in framerate, we could, in principle, sample this data at a lower framerate, by removing time points of the videos but this typically leads to worse results. With some pilot data, we have tried to use fewer time intervals for our analysis but they always gave worse results. We found we need to feed the maximum amount of information available into the model to get the best results (i.e. the fastest frame rate possible, given the data available). Our goal is to teach the neural net to identify dynamic space-time localised events from time lapse videos, in which the duration of an event is a key parameter. Our division events take 10 minutes or less to complete therefore we used 5 timepoints in the videos for the deep learning model. If we considered another system with dynamic events which have a duration T when we would use T/t timepoints where t is the minimum time interval (for our data t=2min). For example if we could image every minute we would use 10 timepoints. As discussed below, we do envision other users with different imaging setups and requirements may need to retrain the model for their own data and to help with this, we have now provided more detailed instructions how to do this (see later).

      In this regard, and because the authors claimed to provide clear instructions on how to reuse their method or adapt it to a different context, I delved deeper into the code and, to my surprise, felt that we are far from the coding practice of what a well-documented and accessible tool should be.

      To start with, one has to be relatively accustomed with Napari to understand how the plugin must be installed, as the only thing given is a pip install command (that could be typed in any terminal without installing the plugin for Napari, but has to be typed inside the Napari terminal, which is mentioned nowhere). Surprisingly, the plugin was not uploaded on Napari hub, nor on PyPI by the authors, so it is not searchable/findable directly, one has to go to the Github repository and install it manually. In that regard, no description was provided in the copy-pasted templated files associated to the napari hub, so exporting it to the hub would actually leave it undocumented.

      We thank the referee for suggesting the example of (DeXtrusion, Villars et al. 2023). We have endeavoured to produce similarly-detailed documentation for our tools. We now have clear instructions for installation requiring only minimal coding knowledge, and we have provided a user manual for the napari plug-in. This includes information on each of the options for using the model and the outputs they will produce. The plugin has been tested by several colleagues using both Windows and Mac operating systems.

      Author response image 1.

      Regarding now the python notebooks, one can fairly say that the "clear instructions" that were supposed to enlighten the code are really minimal. Only one notebook "trainingUNetCellDivision10.ipynb" has actually some comments, the other have (almost) none nor title to help the unskilled programmer delving into the script to guess what it should do. I doubt that a biologist who does not have a strong computational background will manage adapting the method to its own dataset (which seems to me unavoidable for the reasons mentioned above).

      Within the README file, we have now included information on how to retrain the models with helpful links to deep learning tutorials (which, indeed, some of us have learnt from) for those new to deep learning. All Jupyter notebooks now include more comments explaining the models.

      Finally regarding the data, none is shared publicly along with this manuscript/code, such that if one doesn't have a similar type of dataset - that must be first annotated in a similar manner - one cannot even test the networks/plugin for its own information. A common and necessary practice in the field - and possibly a longer lasting contribution of this work - could have been to provide the complete and annotated dataset that was used to train and test the artificial neural network. The basic reason is that a more performant, or more generalisable deep-learning model may be developed very soon after this one and for its performance to be fairly compared, it requires to be compared on the same dataset. Benchmarking and comparison of methods performance is at the core of computer vision and deep-learning.

      We thank the referee for these comments. We have now uploaded all the data used to train the models and to test them, as well as all the data used in the analyses for the paper. This includes many videos that were not used for training but were analysed to generate the paper’s results. The link to these data sets is provided in our GitHub page (https://github.com/turleyjm/cell-division-dl- plugin/tree/main). In the folder for the data sets and in the GitHub repository, we have included the Jupyter notebooks used to train the models and these can be used for retraining. We have made our data publicly available at Zenodo dataset https://zenodo.org/records/10846684 (added to last paragraph of discussion). We have also included scripts that can be used to compare the model output with ground truth, including outputs highlighting false positives and false negatives. Together with these scripts, models can be compared and contrasted, both in general and in individual videos. Overall, we very much appreciate the reviewer’s advice, which has made the plugin much more user- friendly and, hopefully, easier for other groups to train their own models. Our contact details are provided, and we would be happy to advise any groups that would like to use our tools.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep-learning models to analyse the dynamics of epithelia. In this way, they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after the healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strength:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is solid.

      Weakness:

      Some aspects of the deep-learning models remained unclear, and the authors might want to think about adding details. First of all, for readers not being familiar with deep-learning models, I would like to see more information about ResNet and U-Net, which are at the base of the new deep-learning models developed here. What is the structure of these networks?

      We agree with the Reviewer and have included additional information on page 8 of the manuscript, outlining some background information about the architecture of ResNet and U-Net models.

      How many parameters do you use?

      We apologise for this omission and have now included the number of parameters and layers in each model in the methods section on page 25.

      What is the difference between validating and testing the model? Do the corresponding data sets differ fundamentally?

      The difference between ‘validating’ and ‘testing’ the model is validating data is used during training to determine whether the model is overfitting. If the model is performing well on the training data but not on the validating data, this a key signal the model is overfitting and changes will need to be made to the network/training method to prevent this. The testing data is used after all the training has been completed and is used to test the performance of the model on fresh data it has not been trained on. We have removed refence to the validating data in the main text to make it simpler and add this explanation to the methods. There is no fundamental (or experimental) difference between each of the labelled data sets; rather, they are collected from different biological samples. We have now included this information in the Methods text on page 24.

      How did you assess the quality of the training data classification?

      These data were generated and hand-labelled by an expert with many years of experience in identifying cell divisions in imaging data, to give the ground truth for the deep learning model.

      Reviewer #1 (Recommendations For The Authors):

      You repeatedly use 'new', 'novel' as well as 'surprising' and 'unexpected'. The latter are rather subjective and it is not clear based on what prior knowledge you make these statements. Unless indicated otherwise, it is understood that the results and methods are new, so you can delete these terms.

      We have deleted these words, as suggested, for almost all cases.

      p.4 "as expected" add a reference or explain why it is expected.

      A reference has now been included in this section, as suggested.

      p.4 "cell divisions decrease linearly with time" Only later (p.10) it turns out that you think about the density of cell divisions.

      This has been changed to "cell division density decreases linearly with time".

      p.5 "imagine is largely in one plane" while below "we generated a 3D z-stack" and above "our in vivo 3D image data" (p.4). Although these statements are not strictly contradictory, I still find them confusing. Eventually, you analyse a 2D image, so I would suggest that you refer to your in vivo data as being 2D.

      We apologise for the confusion here; the imaging data was initially generated using 3D z-stacks but this 3D data is later converted to a 2D focused image, on which the deep learning analysis is performed. We are now more careful with the language in the text.

      p.7 "We have overcome (...) the standard U-Net model" This paragraph remains rather cryptic to me. Maybe you can explain in two sentences what a U-Net is or state its main characteristics. Is it important to state which class you have used at this point? Similarly, what is the exact role of the ResNet model? What are its characteristics?

      We have included more details on both the ResNet and U-Net models and how our model incorporates properties from them on Page 8.

      p.8 Table 1 Where do I find it? Similarly, I could not find Table 2.

      These were originally located in the supplemental information document, but have been moved to the main manuscript.

      p.9 "developing tissue in normal homeostatic conditions" Aren't homeostatic and developing contradictory? In one case you maintain a state, in the other, it changes.

      We agree with the Reviewer and have removed the word ‘homeostatic’.

      p.9 "Develop additional models" I think 'models' refers to deep learning models, not to physical models of epithelial tissue development. Maybe you can clarify this?

      Yes, this is correct; we have phrased this better in the text.

      p.12 "median error" median difference to the manually acquired data?

      Yes, and we have made this clearer in the text, too.

      p.12 "we expected to observe a bias of division orientation along this axis" Can you justify the expectation? Elongated cells are not necessarily aligned with the direction of a uniaxially applied stress.

      Although this is not always the case, we have now included additional references to previous work from other groups which demonstrated that wing epithelial cells do become elongated along the P/D axis in response to tension.

      p.14 "a rather random orientation" Please, quantify.

      The division orientations are quantified in Fig. 4F,G; we have now changed our description from ‘random’ to ‘unbiased’.

      p.17 "The theories that must be developed will be statistical mechanical (stochastic) in nature" I do not understand. Statistical mechanics refers to systems at thermodynamic equilibrium, stochastic to processes that depend on, well, stochastic input.

      We have clarified that we are referring to non-equilibrium statistical mechanics (the study of macroscopic systems far from equilibrium, a rich field of research with many open problems and applications in biology).

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      In general, novelty over previous work does not seem particularly important. From a methodological point of view, the models are based on generic architectures of convolutional neural networks, with minimal changes, and on ideas already explored in general. The authors seem to have missed much (most?) of the literature on the specific topic of detecting mitotic events in 2D timelapse images, which has been published in more specialized journals or Proceedings. (TPMAI, CCVPR etc., see references below). Even though the image modality or biological structure may be different (non-fluorescent images sometimes), I don't believe it makes a big difference. How the authors' approach compares to this previously published work is not discussed, which prevents me from objectively assessing the true contribution of this article from a methodological perspective.

      On the contrary, some competing works have proposed methods based on newer - and generally more efficient - architectures specifically designed to model temporal sequences (Phan 2018, Kitrungrotsakul 2019, 2021, Mao 2019, Shi 2020). These natural candidates (recurrent networks, long-short-term memory (LSTM) gated recurrent units (GRU), or even more recently transformers), coupled to CNNs are not even mentioned in the manuscript, although they have proved their generic superiority for inference tasks involving time series (Major point 2). Even though the original idea/trick of exploiting the different channels of RGB images to address the temporal aspect might seem smart in the first place - as it reduces the task of changing/testing a new architecture to a minimum - I guess that CNNs trained this way may not generalize very well to videos where the temporal resolution is changed slightly (Major point 1). This could be quite problematic as each new dataset acquired with a different temporal resolution or temperature may require manual relabeling and retraining of the network. In this perspective, recent alternatives (Phan 2018, Gilad 2019) have proposed unsupervised approaches, which could largely reduce the need for manual labeling of datasets.

      We thank the reviewer for their constructive comments. Our goal is to develop a cell detection method that has a very high accuracy, which is critical for practical and effective application to biological problems. The algorithms need to be robust enough to cope with the difficult experimental systems we are interested in studying, which involve densely packed epithelial cells within in vivo tissues that are continuously developing, as well as repairing. In response to the above comments of the reviewer, we apologise for not including these important papers from the division detection and deep learning literature, which are now discussed in the Introduction (on page 4).

      A key novelty of our approach is the use of multiple fluorescent channels to increase information for the model. As the referee points out, our method benefits from using and adapting existing highly effective architectures. Hence, we have been able to incorporate deeper models than some others have previously used. An additional novelty is using this same model architecture (retrained) to detect cell division orientation. For future practical use by us and other biologists, the models can easily be adapted and retrained to suit experimental conditions, including different multiple fluorescent channels or number of time points. Unsupervised approaches are very appealing due to the potential time saved compared to manual hand labelling of data. However, the accuracy of unsupervised models are currently much lower than that of supervised (as shown in Phan 2018) and most importantly well below the levels needed for practical use analysing inherently variable (and challenging) in vivo experimental data.

      Regarding the other convolutional neural networks described in the manuscript:

      (1) The one proposed to predict the orientation of mitosis performs a regression task, predicting a probability for the division angle. The architecture, which must be different from a simple Unet, is not detailed anywhere, so the way it was designed is difficult to assess. It is unclear if it also performs mitosis detection, or if it is instead used to infer orientation once the timing and location of the division have been inferred by the previous network.

      The neural network used for U-NetOrientation has the same architecture as U-NetCellDivision10 but has been retrained to complete a different task: finding division orientation. Our workflow is as follows: firstly, U-NetCellDivision10 is used to find cell divisions; secondly, U-NetOrientation is applied locally to determine the division orientation. These points have now been clarified in the main text on Page 14.

      (2) The one proposed to improve the quality of cell boundary images before segmentation is nothing new, it has now become a classic step in segmentation, see for example Wolny et al. eLife 2020.

      We have cited similar segmentation models in our paper and thank the referee for this additional one. We had made an improvement to the segmentation models, using GFP-tagged E-cadherin, a protein localised in a thin layer at the apical boundary of cells. So, while this is primarily a 2D segmentation problem, some additional information is available in the z-axis as the protein is visible in 2-3 separate z-slices. Hence, we supplied this 3-focal plane input to take advantage of the 3D nature of this signal. This approach has been made more explicit in the text (Pages 14, 15) and Figure (Fig. 2D).

      As a side note, I found it a bit frustrating to realise that all the analysis was done in 2D while the original images are 3D z-stacks, so a lot of the 3D information had to be compressed and has not been used. A novelty, in my opinion, could have resided in the generalisation to 3D of the deep-learning approaches previously proposed in that context, which are exclusively 2D, in particular, to predict the orientation of the division.

      Our experimental system is a relatively flat 2D tissue with the orientation of the cell divisions consistently in the xy-plane. Hence, a 2D analysis is most appropriate for this system. With the successful application of the 2D methods already achieving high accuracy, we envision that extension to 3D would only offer a slight increase in effectiveness as these measurements have little room for improvement. Therefore, we did not extend the method to 3D here. However, of course, this is the next natural step in our research as 3D models would be essential for studying 3D tissues; such 3D models will be computationally more expensive to analyse and more challenging to hand label.

      Concerning the biological application of the proposed methods, I found the results interesting, showing the potential of such a method to automatise mitosis quantification for a particular biological question of interest, here wound healing. However, the deep learning methods/applications that are put forward as the central point of the manuscript are not particularly original.

      We thank the referee for their constructive comments. Our aim was not only to show the accuracy of our models but also to show how they might be useful to biologists for automated analysis of large datasets, which is a—if not the—bottleneck for many imaging experiments. The ability to process large datasets will improve robustness of results, as well as allow additional hypotheses to be tested. Our study also demonstrated that these models can cope with real in vivo experiments where additional complications such as progressive development, tissue wounding and inflammation must be accounted for.

      Major point 1: generalisation potential of the proposed method.

      The neural network model proposed for mitosis detection relies on a 2D convolutional neural network (CNN), more specifically on the Unet architecture, which has become widespread for the analysis of biology and medical images. The strategy proposed here exploits the fact that the input of such an architecture is natively composed of several channels (originally 3 to handle the 3 RGB channels, which is actually a holdover from computer vision, since most medical/biological images are gray images with a single channel), to directly feed the network with 3 successive images of a timelapse at a time. This idea is, in itself, interesting because no modification of the original architecture had to be carried out. The latest 10-channel model (U-NetCellDivision10), which includes more channels for better performance, required minimal modification to the original U-Net architecture but also simultaneous imaging of cadherin in addition to histone markers, which may not be a generic solution.

      We believe we have provided a general approach for practical use by biologists that can be applied to a range of experimental data, whether that is based on varying numbers of fluorescent channels and/or timepoints. We envisioned that experimental biologists are likely to have several different parameters permissible for measurement based on their specific experimental conditions e.g., different fluorescently labelled proteins (e.g. tubulin) and/or time frames. To accommodate this, we have made it easy and clear in the code on GitHub how these changes can be made. While the model may need some alterations and retraining, the method itself is a generic solution as the same principles apply to very widely used fluorescent imaging techniques.

      Since CNN-based methods accept only fixed-size vectors (fixed image size and fixed channel number) as input (and output), the length or time resolution of the extracted sequences should not vary from one experience to another. As such, the method proposed here may lack generalization capabilities, as it would have to be retrained for each experiment with a slightly different temporal resolution. The paper should have compared results with slightly different temporal resolutions to assess its inference robustness toward fluctuations in division speed.

      If multiple temporal resolutions are required for a set of experiments, we envision that the model could be trained over a range of these different temporal resolutions. Of course, the temporal resolution, which requires the largest vector would be chosen as the model's fixed number of input channels. Given the depth of the models used and the potential to easily increase this by replacing resnet34 with resnet50 or resnet101 the model would likely be able to cope with this, although we have not specifically tested this. (page 27)

      Another approach (not discussed) consists in directly convolving several temporal frames using a 3D CNN (2D+time) instead of a 2D, in order to detect a temporal event. Such an idea shares some similarities with the proposed approach, although in this previous work (Ji et al. TPAMI 2012 and for split detection Nie et al. CCVPR 2016) convolution is performed spatio-temporally, which may present advantages. How does the authors' method compare to such an (also very simple) approach?

      We thank the Reviewer for this insightful comment. The text now discusses this (on Pages 8 and 17). Key differences between the models include our incorporation of multiple light channels and the use of much deeper models. We suggest that our method allows for an easy and natural extension to use deeper models for even more demanding tasks e.g. distinguishing between healthy and defective divisions. We also tested our method with ‘difficult conditions’ such as when a wound is present; despite the challenges imposed by the wound (including the discussed reduction in fluorescent intensities near the wound edge), we achieved higher accuracy compared to Nie et al. (accuracy of 78.5% compared to our F1 score of 0.964) using a low-density in vitro system.

      Major point 2: innovatory nature of the proposed method.

      The authors' idea of exploiting existing channels in the input vector to feed successive frames is interesting, but the natural choice in deep learning for manipulating time series is to use recurrent networks or their newer and more stable variants (LSTM, GRU, attention networks, or transformers). Several papers exploiting such approaches have been proposed for the mitotic division detection task, but they are not mentioned or discussed in this manuscript: Phan et al. 2018, Mao et al. 2019, Kitrungrotaskul et al. 2019, She et al 2020.

      An obvious advantage of an LSTM architecture combined with CNN is that it is able to address variable length inputs, therefore time sequences of different lengths, whereas a CNN alone can only be fed with an input of fixed size.

      LSTM architectures may produce similar accuracy to the models we employ in our study, however due to the high degree of accuracy we already achieve with our methods, it is hard to see how they would improve the understanding of the biology of wound healing that we have uncovered. Hence, they may provide an alternative way to achieve similar results from analyses of our data. It would also be interesting to see how LTSM architectures would cope with the noisy and difficult wounded data that we have analysed. We agree with the referee that these alternate models could allow an easier inclusion of difference temporal differences in division time (see discussion on Page 20). Nevertheless, we imagine that after selecting a sufficiently large input time/ fluorescent channel input, biologists could likely train our model to cope with a range of division lengths.

      Another advantage of some of these approaches is that they rely on unsupervised learning, which can avoid the tedious relabeling of data (Phan et al. 2018, Gilad et al. 2019).

      While these are very interesting ideas, we believe these unsupervised methods would struggle under the challenging conditions within ours and others experimental imaging data. The epithelial tissue examined in the present study possesses a particularly high density of cells with overlapping nuclei compared to the other experimental systems these unsupervised methods have been tested on. Another potential problem with these unsupervised methods is the difficulty in distinguishing dynamic debris and immune cells from mitotic cells. Once again despite our experimental data being more complex and difficult, our methods perform better than other methods designed for simpler systems as in Phan et al. 2018 and Gilad et al. 2019; for example, analysis performed on lower density in vitro and unwounded tissues gave best F1 scores for a single video was 0.768 and 0.829 for unsupervised and supervised respectively (Phan et al. 2018). We envision that having an F1 score above 0.9 (and preferably above 0.95), would be crucial for practical use by biologists, hence we believe supervision is currently still required. We expect that retraining our models for use in other experimental contexts will require smaller hand labelled datasets, as they will be able to take advantage of transfer learning (see discussion on Page 4).

      References :

      We have included these additional references in the revised version of our Manuscript.

      Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1), 221-231. >6000 citations

      Nie, W. Z., Li, W. H., Liu, A. A., Hao, T., & Su, Y. T. (2016). 3D convolutional networks-based mitotic event detection in time-lapse phase contrast microscopy image sequences of stem cell populations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 55-62).

      Phan, H. T. H., Kumar, A., Feng, D., Fulham, M., & Kim, J. (2018). Unsupervised two-path neural network for cell event detection and classification using spatiotemporal patterns. IEEE Transactions on Medical Imaging, 38(6), 1477-1487.

      Gilad, T., Reyes, J., Chen, J. Y., Lahav, G., & Riklin Raviv, T. (2019). Fully unsupervised symmetry-based mitosis detection in time-lapse cell microscopy. Bioinformatics, 35(15), 2644-2653.

      Mao, Y., Han, L., & Yin, Z. (2019). Cell mitosis event analysis in phase contrast microscopy images using deep learning. Medical image analysis, 57, 32-43.

      Kitrungrotsakul, T., Han, X. H., Iwamoto, Y., Takemoto, S., Yokota, H., Ipponjima, S., ... & Chen, Y. W. (2019). A cascade of 2.5 D CNN and bidirectional CLSTM network for mitotic cell detection in 4D microscopy image. IEEE/ACM transactions on computational biology and bioinformatics, 18(2), 396-404.

      Shi, J., Xin, Y., Xu, B., Lu, M., & Cong, J. (2020, November). A Deep Framework for Cell Mitosis Detection in Microscopy Images. In 2020 16th International Conference on Computational Intelligence and Security (CIS) (pp. 100-103). IEEE.

      Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A. V., Louveaux, M., ... & Kreshuk, A. (2020). Accurate and versatile 3D segmentation of plant tissues at cellular resolution. Elife, 9, e57613.

    1. eLife assessment

      This valuable contribution follows past descriptions of ciliation defects, potentially linked to cholinergic neuronal dysfunction, associated with mutated G2019S Lrrk2 expression. The strength of evidence is considered solid and broadly supportive of the claims concerning well-characterized cilia changes in cholinergic neurons over time in the model; however, additional work may be required to define the specificity of the pRab12 antibody in the IHC technique, dependence on LRRK2, and clarification of the cilia phenotype in sporadic PD brains that exists (for the moment) only in a non-peer-reviewed pre-print, despite the prominence of these (preliminary) results highlighted in the abstract and text of the current manuscript. It is hoped that the authors will begin to address the feedback provided by the expert reviewers to help provide a more mechanistic basis for the audience interested in cholinergic defects associated with Parkinson's disease.

    2. Reviewer #1 (Public review):

      Summary:

      This study represents valuable insight into the potential contribution of ciliation deficits and cholinergic neuron survival in an etiologically appropriate Parkinson's disease mouse model. The evidence presented is convincing, employing a validated methodology to assess measures across multiple brain regions and time points, with adequate observation numbers. Similarities between some of the data here and human patients further validate the model, and the study provides numerous avenues to aid future advances.

      Strengths:

      Overall, this study presents a thorough analysis of ciliary defects and cell loss in cholinergic neurons throughout the brain in the LRRK2 G2019S knockin mouse model of Parkinson's disease. The authors aimed to characterize ciliary defects in areas not only implicated in PD but also in cholinergic neuron function. Additionally, they repeated measures across age and sex, presenting a body of work that is more readily translatable to human disease states. The strengths of the paper included the breadth of brain regions tested and additional mechanistic contributions of LRRK2 that may correlate to their observed phenotypes. The study conveys to the reader the ciliary phenotype observed in all the cholinergic neurons assessed throughout the brains of knock-in LRRK2 mutant mice. Importantly, the pattern of changes is, in some instances, strikingly similar to PD, which strengthens the case for construct and face validation of the G2019S knock-in mouse model. Future investigations of the physiological and behavioural correlates/consequences of these changes will inform ongoing and, as yet untried, therapeutic intervention attempts.

      Weaknesses:

      At times, the claims are only partially substantiated by how the data are presented (e.g., inappropriate statistics within an age (t-tests, not ANOVA) and a lack of comparison between ages (despite referring to the progress of a phenotype). More appropriate statistical analyses and revisions to the data presentation are required to substantiate basic and more 'progressive' conclusions. Further, distributing the central claim over 10 figures dilutes the impact, many of which could be compressed into a couple of single figures (e.g., cell counts in all regions and ciliation). Also, a summary graphic showing the brain regions affected by ciliation alterations and cell loss at young, middle, and old age in the GS mice would be hugely beneficial. This peer would like to see more discussion of how the observed changes would impact circuit-level function and more speculation of the underlying mechanisms leading to the deficits. Minor changes to the abstract and introduction (to include more detail in the rationale and supporting evidence) are recommended, as summaries of existing literature are vague and could flow better between one statement and the next.

    3. Reviewer #2 (Public review):

      Summary:

      LRRK2 has previously been shown to affect cilia formation and stability both in vitro and in vivo, in striatal cholinergic interneurons, in both transgenic mice and in human post-mortem brain samples from subjects carrying one of the LRRK2 pathogenic mutations: G2019S. In the current study, Brahmia and colleagues have conducted a comprehensive assessment of G2019S knock-in mice to address some gaps in the field, specifically: extending analysis to additional cholinergic neurons across 3 time points and determining the functional consequences of the ciliation deficits. They find that primary cilia are lost in all cholinergic neurons, with basal forebrain cholinergic neurons displaying an early onset (in 4-5-month-old mice) compared with other regions. They also show early dystrophic changes in cholinergic axons derived from basal forebrain and brainstem cholinergic neurons and age-dependent cholinergic cell loss in select forebrain and brainstem nuclei.

      Strengths:

      This is a comprehensive and careful analysis of ciliary deficits and their downstream consequences, which we assume are deficits in innervation and cell loss.

      Weaknesses:

      This study is observational and does not address the underlying mechanisms. The data on pRab12, although downstream of LRRK2, does not clearly address this and, instead, raises more questions than answers: e.g., is there really differentiation from Rab10 and its phosphorylation or is it primarily due to the limitations of pRab10 antibodies with regards to the lack of suitability of this antibody in mouse brain sections (could immunoblots on brain punches have been performed to overcome this?). Are Rab10, Rab12, and LRRK2 expressed at different levels in the vulnerable cell types? Plenty of recent high-quality single-cell/single nuclear RNA-seq data could have been used to address such a fundamental question. LRRK2 small molecule inhibitors are available and progressing in the clinic. They could/should have been used to demonstrate the LRRK2 dependence, reversibility, and timing of therapeutic intervention. The authors suggest that the mouse data mirror (and potentially explain) the cholinergic loss in PD patient brains, but this is not measured in the current work (the authors do acknowledge this limitation and suggest that this is an important further study). There are some recent human data (Khan et al 2024 PMID: 38293195, BioRxiv, which the authors cite) showing loss of primary cilia and cholinergic neurons in sporadic PD (no evidence of aberrant LRRK2 activity) and, interestingly, this is not further exacerbated in G2019S carriers, which may suggest a more complex underlying mechanism.

    4. Reviewer #3 (Public review):

      Summary:

      The authors described cilia deficits, phospho-Rab12 accumulation, dystrophic axons in cholinergic neurons, and loss of the cholinergic neurons in the mouse brains of G2019S-LRRK2 knock-in mice, a preclinical animal model for Parkinson's disease. They showed that the above changes associated with cholinergic neurons are age-dependent and region-specific. The observation is interesting considering the neuron-type-specific effect of the LRRK2-G2019S in mice.

      Strengths:

      The observations are important and show neuron type-specific effects of the PD mutation of LRRK2 relevant to PD pathologies.

      Weaknesses:

      The authors may over-interpret the data, and the study may lack mechanistic investigation.

    1. eLife assessment

      In this manuscript, Griesius et al analyze the dendritic integration properties of NDNF and OLM interneurons, and the current dataset suggests that even though both cell types display supralinear NMDA receptor-dependent synaptic integration, this may be associated with dendritic calcium transients only in NDNF interneurons. These findings are important because they could shed light on the functional diversity of different classes of interneurons in the mouse neocortex and hippocampus, which in turn can have major implications for understanding information flow in complex neural circuits. They are considered as being currently incomplete, however, due to: (i) the large variability and small sample size of multiple datasets, which prevents a finer evaluation of cellular and molecular mechanisms accounting for the difference in the integrative properties of different interneuron types; (ii) lack of control experiments to rule out that the effect of the NMDA antagonist AP5 on synaptic integration is not confounded by potential phototoxicity damage; (iii) lack of a precise control of the uncaging location.

    2. Reviewer #1 (Public review):

      The manuscript by Griesius et al. addresses the dendritic integration of synaptic input in cortical GABAergic interneurons (INs). Dendritic properties, passive and active, of principal cells have been extensively characterized, but much less is known about the dendrites of INs. The limited information is particularly relevant in view of the high morphological and physiological diversity of IN types. The few studies that investigated IN dendrites focused on parvalbumin-expressing INs. In fact, in a previous study, the authors examined dendritic properties of PV INs, and found supralinear dendritic integration in basal, but not in apical dendrites (Cornford et al., 2019 eLife).

      In the present study, complementary to the prior work, the authors investigate whether dendrite-targeting IN types, NDNF-expressing neurogliaform cells, and somatostatin(SOM)-expressing O-LM neurons, display similar active integrative properties by combining clustered glutamate-uncaging and pharmacological manipulations with electrophysiological recording and calcium imaging from genetically identified IN types in mouse acute hippocampal slices.

      The main findings are that NDNF IN dendrites show strong supralinear summation of spatially- and temporally-clustered EPSPs, which is changed into sublinear behavior by bath application of NMDA receptor antagonists, but not by Na+-channel blockers. L-type calcium channel blockers abolished the supralinear behavior associated calcium transients but had no or only weak effect on EPSP summation. SOM IN dendrites showed similar, albeit weaker NMDA-dependent supralinear summation, but no supralinear calcium transients were detected in these INs. In summary, the study demonstrates that different IN types are endowed with active dendritic integrative mechanisms, but show qualitative and quantitative divergence in these mechanisms.

      While the research is conceptionally not novel, it constitutes an important incremental gain in our understanding of the functional diversity of GABAergic INs. In view of the central roles of IN types in network dynamics and information processing in the cortex, results and conclusions are of interest to the broader neuroscience community.

      The experiments are well designed, and closely follow the approach from the previous publication in parts, enabling direct comparison of the results obtained from the different IN types. The data is convincing and the conclusions are well-supported, and the manuscript is very well-written.

      I see only a few open questions and some inconsistencies in the presentation of the data in the figures (see details below).

    3. Reviewer #2 (Public review):

      Summary:

      Griesius et al. investigate the dendritic integration properties of two types of inhibitory interneurons in the hippocampus: those that express NDNF+ and those that express somatostatin. They found that both neurons showed supralinear synaptic integration in the dendrites, blocked by NMDA receptor blockers but not by blockers of Na+ channels. These experiments are critically overdue and very important because knowing how inhibitory neurons are engaged by excitatory synaptic input has important implications for all theories involving these inhibitory neurons.

      Strengths:

      (1) Determined the dendritic integration properties of two fundamental types of inhibitory interneurons.

      (2) Convincing demonstration that supra-threshold integration in both cell types depends on NMDA receptors but not on Na+ channels.

      Weaknesses:

      It is unknown whether highly clustered synaptic input, as used in this study (and several previous studies), occurs physiologically.

    4. Reviewer #3 (Public review):

      Summary:

      The authors study the temporal summation of caged EPSPs in dendrite-targeting hippocampal CA1 interneurons. There are some descriptive data presented, indicating non-linear summation, which seems to be larger in dendrites of NDNF expressing neurogliaform cells versus OLM cells. However, the underlying mechanisms are largely unclear.

      Strengths:

      Focal 2-photon uncaging of glutamate is a nice and detailed method to study temporal summation of small potentials in dendritic segments.

      Weaknesses:

      (1) NMDA-receptor signaling in NDNF-IN. The authors nicely show that temporal summation in dendrites of NDNF-INs is to a certain extent non-linear. However, this non-linearity varies massively from cell to cell (or dendrite to dendrite) from 0% up to 400% (Figure S2). The reason for this variability is totally unclear. Pharmacology with AP5 hints towards a contribution of NMDA receptors. However, the authors claim that the non-linearity is not dependent on EPSP amplitude (Figure S2), which should be the case if NMDA-receptors are involved. Unfortunately, there are no voltage-clamp data of NMDA currents similar to the previous study. This would help to see whether NMDA-receptor contribution varies from synapse to synapse to generate the observed variability? Furthermore, the NMDA- and AMPA-currents would help to compare NDNF with the previously characterized PV cells and would help to contribute to our understanding of interneuron function.

      (2) Sublinear summation in NDNF-INs. In the presence of AP5, the temporal summation of caged EPSPs is sublinear. That is potentially interesting. The authors claim that this might be dependent on the diameter of dendrites. Many voltage-gated channels can mediate such things as well. To conclude the contribution of dendritic diameter, it would be helpful to at least plot the extent of sublinearity in single NDNF dendrites versus the dendritic diameter. Otherwise, this statement should be deleted.

      (3) Nonlinear EPSP summation in OLM-IN. The authors do similar experiments in dendrite-targeting OLM-INs and show that the non-linear summation is smaller than in NDNF cells. The reason for this remains unclear. The authors claim that this is due to the larger dendritic diameter in OLM cells. However, there is no analysis. The minimum would be to correlate non-linearity with dendritic diameter in OLM-cells. Very likely there is an important role of synapse density and glutamate receptor density, which was shown to be very low in proximal dendrites of OLM cells and strongly increase with distance (Guirado et al. 2014, Cerebral Cortex 24:3014-24, Gramuntell et al. 2021, Front Aging Neurosci 13:782737). Therefore, the authors should perform a set of experiments in more distal dendrites of OLM cells with diameters similar to the diameters of the NDNF cells. Even better would be if the authors would quantify synapse density by counting spines and show how this density compares with non-linearity in the analyzed NDNF and OLM dendrites.

      (4) NMDA in OLM. Similar to the NDNF cells, the authors claim the involvement of NMDA receptors in OLM cells. Again there seems to be no dependence on EPSP amplitude, which is not understandable at this point (Figure S3). Even more remarkable is the fact that the authors claim that there is no dendritic calcium increase after activation of NMDA receptors. Similar to NDNF-cell analysis there are no NMDA currents in OLMs. Unfortunately, even no calcium imaging experiments were shown. Why? Are there calcium-impermeable NNDA receptors in OLM cells? To understand this phenomenon the minimum is to show some physiological signature of NMDA-receptors, for example, voltage-clamp currents. Furthermore, it would be helpful to systematically vary stimulus intensity to see some calcium signals with larger stimulation. In case there is still no calcium signal, it would be helpful to measure reversal potentials with different ion compositions to characterize the potentially 'Ca2+ impermeable' voltage-dependent NMDA receptors in OLM cells.

    1. eLife assessment

      This study used electrophysiology and imaging to show that the majority of excitatory cells in the dentate gyrus of adult mice have very slow oscillations during non-rapid eye movement (NREM) sleep. The oscillations were influenced by serotonin when it was released during NREM sleep. Moreover, the serotonin receptor type 1a mediated the effect, and reducing these receptors impaired a type of memory. The significance of the study is important and the strength of the evidence is solid, but revisions to the figures and making conclusions more consistent with the data could improve the significance and strength of evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory.

      Strengths and Weaknesses:

      The authors used state-of-the-art techniques to carry out these experiments. Given that the functional role of infraslow rhythm still remains to be studied, this study provides convincing evidence of the role of DG cells in regulating infraslow rhythm, sleep microarchitecture, and memory.

      I have a few minor comments.

      (1) Decreased infraslow rhythm during NREMs in the 5ht1a KO mice is striking. It would be helpful to know whether sleep-wake states, MAs, and transitions to REMs are changed.

      (2) It would be interesting to discuss whether the magnitude in changes of infraslow rhythm strength is correlated with memory performance (Figure 6).

      (3) The authors should cite the Oikonomou Neuron paper that describes slow oscillatory activity of DRN SERT neurons during NREM sleep.

      (4) The authors should clarify how they define the phasic pattern of the photometry signal.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single-cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep.

      The important findings are:

      (1) The antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep, and

      (2) The GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy, and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses:

      (1) The current data set and analysis are insufficient to interpret the observation correctly.

      a. In Figure 1A, during NREM, the peaks and troughs of GC population activities seem to gradually decrease over time. Please address this point.

      b. In Figure 1F, about 30% of Ca dips coincided with MA (EMG increase) and 60% of Ca dips did not coincide with EMG increase. If this is true, the readers can find 8 Ca dips which are not associated with MAs from Figure 1E. If MAs were clustered, please describe this properly.

      c. In Figure 1F, the legend stated the percentage during NREM. If the authors want to include the percentage of wake and REM, please show the traces with Ca dips during wake and REM. This concern applies to all pie charts provided by the authors.

      d. In Figure 1C, please provide line plots connecting the same session. This request applies to all related figures.

      e. In Figure 2C, the significant increase during REM and the same level during NREM are not convincing. In Figure 2A, the several EMG increasing bouts do not appear to be MA, but rather wakefulness, because the duration of the EMG increase is greater than 15 seconds. Therefore, it is possible that the wake bouts were mixed with NREM bouts, leading to the decrease of Ca activity during NREM. In fact, In Figure 2E, the 4th MA bout seems to be the wake bout because the EMG increase lasts more than 15 seconds.

      f. Figure 5D REM data are interesting because the DRN activity is stably silenced during REM. The varied correlation means the varied DG activity during REM. The authors need to address it.

      g. In Figure 6, the authors should show the impact of DG Htr1a knockdown on sleep/wake structure including the frequency of MAs. I agree with the impact of Htr1a on DG ISO, but possible changes in sleep bout may induce the DG ISO disturbance.

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Figure 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Figure 4), which reduces the reliability of this study.

    4. Reviewer #3 (Public review):

      Summary:

      The authors employ a series of well-conceived and well-executed experiments involving photometric imaging of the dentate gyrus and raphe nucleus, as well as cell-type specific genetic manipulations of serotonergic receptors that together serve to directly implicate serotonergic regulation of dentate gyrus (DG) granule (GC) and mossy cell (MC) activity in association with an infra slow oscillation (ISO) of neural activity has been previously linked to general cortical regulation during NREM sleep and microarousals.

      Strengths:

      There are a number of novel and important results, including the modulation of dentage granule cell activity by the infraslow oscillation during NREM sleep, the selective association of different subpopulations of granule cells to microarousals (MA), the anticorrelation of raphe activity with infraslow dentate activity.

      The discussion includes a general survey of ISOs and recent work relating to their expression in other brain areas and other potential neuromodulatory system involvement, as well as possible connections with infraslow oscillations, micro-arousals, and sensory sensitivity.

      Weaknesses:

      (1) The behavioral results showing contextual memory impairment resulting from 5-HT1a knockdown are fine but are over-interpreted. The term memory consolidation is used several times, as well as references to sleep-dependence. This is not what was tested. The receptor was knocked down, and then 2 weeks later animals were found to have fear conditioning deficits. They can certainly describe this result as indicating a connection between 5-HT1a receptor function and memory performance, but the connection to sleep and consolidation would just be speculation. The fact that 5-HT1a knockdown also impacted DG ISOs does not establish dependency. Some examples of this are:

      a. The final conclusion asserts "Together, our study highlights the role of neuromodulation in organizing neuronal activity during sleep and sleep-dependent brain functions, such as memory.". However, the reported memory effects (impairment of fear conditioning) were not shown to be explicitly sleep-dependent.

      b. Earlier in the discussion it mentions "Finally, we showed that local genetic ablation of 5-HT1a receptors in GCs impaired the ISO and memory consolidation". The effect shown was on general memory performance - consolidation was not specifically implicated.

      (2) The assertion on page 9 that the results demonstrate "that the 5-HT is directly acting in the DG to gate the oscillations" is a bit strong given the magnitude of effect shown in Figure 6D, and the absence of demonstration of negative effect on cortical areas that also show ISO activity and could impact DG activity (see requested cortical sigma power analysis).

      (3) Recent work has shown that abnormal DG GC activity can result from the use of the specific Ca indicator being used (GCaMP6s). (Teng, S., Wang, W., Wen, J.J.J. et al. Expression of GCaMP6s in the dentate gyrus induces tonic-clonic seizures. Sci Rep 14, 8104 (2024). https://doi.org/10.1038/s41598-024-58819-9). The authors of that study found that the effect seemed to be specific to GCaMP6s and that GCaMP6f did not lead to abnormal excitability. Note this is of particular concern given similar infraslow variation of cortical excitability in epilepsy (cf Vanhatalo et al. PNAS 2004). While I don't think that the experiments need to be repeated with a different indicator to address this concern, you should be able to use the 2p GCaMP7 experiments that have already been done to provide additional validation by repeating the analyses done for the GCaMP6s photometry experiments. This should be done anyway to allow appropriate comparison of the 2p and photometry results.

      (4) While the discussion mentions previous work that has linked ISOs during sleep with regulation of cortical oscillations in the sigma band, oddly no such analysis is performed in the current work even though it is presumably available and would be highly relevant to the interpretation of a number of primary results including the relationship between the ISOs and MAs observed in the DG and similar results reported in other areas, as well as the selective impact of DG 5-HT1a knockdown on DG ISOs. For example, in the initial results describing the cross-correlation of calcium activity and EMG/EEG with MA episodes (paragraph 1, page 4), similar results relating brief arousals to the infraslow fluctuation in sleep spindles (sigma band) have been reported also at .02 Hz associated with variation in sensory arousability (cf. Cardis et al., "Cortico-autonomic local arousals and heightened somatosensory arousability during NREMS of mice in neuropathic pain", eLife 2021). It would be important to know whether the current results show similar cortical sigma band correlations. Also, in the results on ISO attenuation following 5-HT1 knockdown on page 7 (Figure 6), how is cortical EEG affected? Is ISO still seen in EEG but attenuated in DG?

      (5) The illustrations of the effect of 5-HT1a knockdown shown in Figure 6 are somewhat misleading. The examples in panels B and C show an effect that is much more dramatic than the overall effect shown in panel D. Panels B and C do not appear to be representative examples. Which of the sample points in panel D are illustrated in panels B and C? It is not appropriate to arbitrarily select two points from different animals for comparison, or worse, to take points from the extremes of the distributions. If the intent is to illustrate what the effect shown in D looks like in the raw data, then you need to select examples that reflect the means shown in panel D. It is also important to show the effect on cortical EEG, particularly in sigma band to see if the effects are restricted to the DG ISOs. It would also be helpful to show that MAs and their correlations as shown in Figure 1 or G as well as broader sleep architecture are not affected.

      (6) On page 9 of the results it states that GCs and MCs are upregulated during NREM and their activity is abruptly terminated by MAs through a 5-HT mediated mechanism. I didn't see anything showing the 5-HT dependence of the MA activity correlation. The results indicate a reduction in ISO modulation of GC activity but not the MA-correlated activity. I would like to see the equivalent of Figure 1,2 G panels with the 5-HT1a manipulation.

    1. eLife assessment

      This important study by Wong et al. addresses a longstanding question in the field of associative learning regarding how a motivationally relevant event can be inferred from prior learning based on neutral stimulus-stimulus associations. The research provides convincing behavioral and neurophysiological evidence to address this important question. The manuscript will be interesting for researchers in behavioral and cognitive neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      This study is an important follow-up to their prior work - Wong et al. (2019), starting with clear questions and hypotheses, followed by a series of thoughtful and organized experiments. The method and results are convincing. Experiment 1 demonstrated the sensory preconditioned fear with few (8) or many (32) sound-light pairings. Experiments 2A and 2B showed the role of PRh NMDA receptors during conditioning for online integration, revealing that this contribution is present only after a few sound-light pairings, not after many sound-light pairings. Experiments 3A and 3B showed the contribution of PRh-BLA communication to online integration, again only after a few but not after many. Contrary to Experiments 3A and 3B, Experiments 4A and 4B showed the contribution of PRh-BLA communication to integration at test only after many but not few sound-light pairings.

      Strengths:

      Throughout the manuscript, the methods and results are clearly organized and described, and the use of statistics is solid, all contributing to the overall clarity of the research. The discussion section was also well-written, effectively comparing the current research with the prior work and offering insightful interpretations and potential future directions for this line of research. I have only a limited amount of concerns about some results and some details of experiments/statistics.

      Weaknesses:

      Could you provide further interpretation regarding line 171: the observation that sensory preconditioned fear increased with the number of sound-light pairings? Was this increase due to better sound-light association learning during Stage 1? Additionally, were there any experimental differences between Experiment 1 and the other experiments that might explain why freezing was higher in the P32 group compared to the P8 group? This pattern seemed to be absent in the other experiments. If we consider the hypothesis that the online integration mechanism is more active with fewer pairings and the chaining mechanism at the test is more prominent with many pairings, we wouldn't expect a difference between the P8 and P32 groups. Given the relatively small sample size in Experiment 1, the authors might consider conducting a cross-experiment analysis or something similar to investigate this further.

    3. Reviewer #2 (Public review):

      This manuscript builds on the authors' earlier work, most recently Wong et al. 2019, in which they showed the importance of the perirhinal cortex (PRh) during the first-order conditioning stage of sensory preconditioning. Sensory preconditioning requires learning between two neutral stimuli (S2-S1) and subsequent development of a conditioned response to one of the neutral stimuli after pairing of the other stimulus with a motivationally relevant unconditioned stimulus (S1-US). One highly debated question regarding the mechanisms of learning of sensory preconditioning has been whether conditioned responses evoked by the indirectly trained stimulus (S2) occur through a mediated representation at the time of the first-order US training, or whether the conditioned responses develop through a chained evoked representation (S2--> S1 --> US) at the time of test. The authors' prior findings provided strong evidence for PRh being involved in mediated learning during the first-order training. They showed that protein synthesis was required during the first-order S1-US learning to support the conditioned response to the indirectly trained stimulus (S2) at the test.

      One question remaining following the previous paper was whether certain conditions may promote a chaining mechanism over mediated learning, as there is some evidence for chained representations at the time of the test. In this paper, the authors directly address this important question and find unambiguous results that the extent of training during the preconditioning stage impacts the involvement of PRh during the first-order conditioning or stage 2. They show that putative blockade of synaptic changes in PRh, using an NMDA antagonist, disrupts responding to the preconditioned cue at test during shorter duration preconditioning training (8 trials), but not during extended training (32 trials). They also show that this is the case for communication between the PRh and BLA during the same stage of training using a contralateral inactivation approach. This confirms their previous findings in 2019 of connectivity between these regions for the short-duration training, while they observe here for the first time that this is not the case for extended training. Finally, they show that with extended training, communication between BLA and the PRh is required at the final test of the preconditioned stimulus, but not for the short duration training.

      The results are clear and extremely consistent across experiments within this paper as well as with earlier work. The experiments here are thorough, and well-conceived, and address an important and highly debated question in the field regarding the neural and psychological mechanisms underlying sensory preconditioning. This work is highly impactful for the field as the debate over mediated versus chaining mechanisms has been an important topic for more than 70 years.

    4. Reviewer #3 (Public review):

      The authors tested whether the number of stimulus-stimulus pairings alters whether preconditioned fear depends on online integration during the formation of the stimulus-outcome memory or during the probe test/mobilization phase, when the original stimulus, which was never paired with aversive events, elicits fear via chaining of stimulus-stimulus and stimulus-outcome memories. They found that sensory preconditioning was successful with either 8 or 32 stimulus-stimulus pairings. Perirhinal cortex NMDA receptor blockade during stimulus-outcome learning impaired preconditioning following 8 but not 32 pairings during preconditioning. Therefore, perirhinal cortex NMDA activity is required for online integration or mediated learning. Perirhinal-basolateral amygdala had nearly identical effects with the same interpretation: these areas communicate during stimulus-outcome learning, and this online communication is required for later expressing preconditioned fear. Disconnection prior to the probe test, when chaining might occur, had different effects: it impaired the expression of preconditioned fear in rats that received 32, but not 8, pairings during preconditioning. The study has several strengths and provides a thoughtful discussion of future experiments. The study is highly impactful and significant; the authors were successful in describing the behavioral and neurobiological mechanisms of mediated learning versus chaining in sensory preconditioning, which is often debated in the learning field. Therefore this study will have a significant impact on the behavioral neurobiology and learning fields.

      Strengths:

      Careful, rigorous experimental design and statistics.

      The discussion leaves open questions that are very much worth exploring. For example - why did perirhinal-amygdala disconnection prior to the probe have no effect in the 8-pairing group, when bilateral perirhinal inactivation did (in Wong et al, 2019)? The authors propose that perirhinal cortex outputs bypass the amygdala during the probe test, which is an excellent hypothesis to test.

      The authors provide evidence that both mediated learning and chaining occur.

      Weaknesses:

      This is inherent to all neural interference and behavioral experiments: biological/psychological functions do not typically operate binarily. There is no single clear number or parameter at which mediated learning or chaining happens, and both probably happen to some extent. Addressing this is even more difficult given behavioral variability across subjects, implant sites, etc. Thus, this is not so much a weakness particular to this study as much as an existential problem, which the authors were able to work around with careful experimental design and appropriate controls.

    1. eLife assessment

      This important work combines theory and experiment to assess how humans make decisions about sequences of pairs of correlated observations. The normative theory for evidence integration in correlated environments will be informative for future investigations. However, the developed theory and data analysis seem currently incomplete: it remains to be seen if the derived decision strategy is indeed normative, or only an approximation thereof, and behavioral modelling would benefit from the assessment of alternative models.

    2. Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants mis-estimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:<br /> (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and<br /> (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for mis-estimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decision-making. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

    1. eLife assessment

      This important work identifies a non-autophagic role for ATG5 in lysosomal repair and the trafficking of the glucose transporter GLUT1 to the cell surface, mediated through the retromer complex. The evidence supporting the conclusions is solid.

    1. eLife assessment

      Supported by convincing data, this valuable study demonstrates that the Chitinase 3-like protein 1 (Chi3l1) interacts with gut microbiota and protects animals from intestinal injury in laboratory colitis model. The revised manuscript sufficiently addressed the reviewers' comments. The work will be of interest to scientists studying crosstalk between gut microbiota and inflammatory diseases.

    2. Reviewer #1 (Public review):

      The manuscript by Chen et al. investigated the interaction between CHI3L1, a chitinase-like protein in the 18 glycosyl hydrolase family, and gut bacteria in the mucosal layers. The authors provided evidence to document the direct interaction between CHI3L1 and peptidoglycan, a major component of bacterial cell wall. Doing so, Chi3l1 produced by gut epithelial cells regulates the balance of gut microbiome and diminishes DSS-induced colitis, potentially through the colonization of protective gram-positive bacteria such as lactobacillus.

      The study is the first to systemically document the interactions between Chi3L1 and microbiome. Convincing data were shown to characterize the imbalance of gram-positive bacteria in the newly generated gut epithelial-specific Chi3L1 deficient mice. Comprehensive FMT experiments were performed to demonstrate the contributions of gut microbiome using the mouse colitis model. The manuscript is strengthened by additional mechanistic studies concerning the binding between Chi3l1 and peptidoglycan, and discussions on existing body of literature demonstrating that detrimental roles of Chi3l1 in mouse IBD model, which conflict with the current study.

    3. Reviewer #2 (Public review):

      Chen et al. investigated the regulatory mechanism of bacterial colonization in the intestinal mucus layer in mice and its implications to intestinal diseases. They demonstrated that Chi3l1 is a protein produced and secreted by intestinal epithelial cells into the mucus layer upon response to the gut microbiota, which has a turnover effect on facilitating the colonization of gram-positive bacteria in the mucosa. The data also indicate that Chi3l1 interacts with the peptidoglycan of the bacteria cell wall, supporting the colonization of beneficial bacteria strains such as Lactobacillus, and that deficiency in Chi3l1 predisposes mice to colitis. The inclusion of a small but pertinent piece of human data added to solidify their findings in mice.

      Overall, the experiments were appropriately designed and executed with precision. The revised manuscript represents a significant improvement over the initial version. The inclusion of new, higher-resolution images provides stronger support for the conclusions drawn. Additionally, statistical analyses of the imaging data, as recommended, have been integrated. The authors have effectively addressed the majority of the reviewers' suggestions and criticisms, making this version well-suited for publication.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) In Figure 1, it is curious that the authors only chose E.coli and staphytlococcus sciuri to test the induction of Chi3l1. What about other bacteria? Why does only E.coli but not staphytlococcus sciuri induce chi3l1 production? It does not prove that the gut microbiome induces the expression of Chi3l1. If it is the effect of LPS, does it trigger a cell death response or inflammatory responses that are known to induce chi3l1 production? What is the role of peptidoglycan in this experiment? Also, it is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in this figure.

      Thank you for your valuable feedback and insightful suggestions. In our study, we tried to identify bacteria from murine gut contents and feces using 16S sequencing. However, only E. coli and Staphylococcus sciuri were identified (Figure 1D). Consequently, our experiments were limited to these two bacterial strains. While we have not tested other bacteria, our data suggest that not all bacteria can induce the expression of Chi3l1. Given that E. coli is Gram-negative and Staphylococcus sciuri is Gram-positive, we hypothesized that the difference in their ability to induce Chi3l1 expression might be due to variations between Gram-negative and Gram-positive bacteria, such as the presence of lipopolysaccharides (LPS).

      To test this hypothesis, we used LPS to induce Chi3l1 expression. Consistent with our hypothesis, LPS successfully induced Chi3l1 expression (Figure 1F&G). Additionally, we observed that Chi3l1 expression is significantly upregulated in specific pathogen-free (SPF) mice compared to germ-free mice (Figure 1A), demonstrating that the gut microbiome induces the expression of Chi3l1.

      Although we have not examined cell death or inflammatory responses, the protective role of Chi3l1 shown in Figure 5 suggests that any such responses would be mild and negligible. Regarding the role of peptidoglycan in the induction of Chi3l1 expression in DLD-1 cells, we have not yet explored this aspect. However, we agree with your suggestion that it would be worthwhile to investigate this in future experiments.

      We have also made the suggested modifications to the labeling (Figure 1A) and the clarification in the revised manuscript accordingly (page 3, Line 95-96; Line 102-106).

      Thank you again for your constructive feedback.

      (2) In Figure 2, the binding between Chi3l1 and PGN needs better characterization, regarding the affinity and how it compares with the binding between Chi3l1 and chitin. More importantly, it is unclear how this interaction could facilitate the colonization of gram-positive bacteria.

      Thank you for your insightful suggestions and we have performed the suggested experiments and included the results in the revised manuscript (Figure 2E-G, page 3-4, Line 132-146).

      Our results indicate that Chi3l1 interact with PGN in a dose-increase manner (Figure 2E). In contrast, the binding between Chi3l1 and chitin did not exhibit dose dependency (Figure 2E). These findings suggest a specific and distinct binding mechanism for Chi3l1 with PGN compared to chitin.

      We conducted DLD-1 cell-bacteria adhesion experiments, using GlmM mutant (PGN synthesis mutant) and K12 (wild-type) bacteria to test their adhesion capabilities. The results showed that the adhesion ability of the GlmM mutant to cells significantly decreased (Figure 2F). Additionally, after knocking down Chi3l1 in DLD-1 cells, we observed a decreased bacterial adhesion (Figure 2G). These findings suggest that Chi3l1 and PGN interaction plays a crucial role in bacterial adhesion.

      (3) In Figure 3, the abundance of furmicutes and other gram-positive species is lower in the knockout mice. What is the rationale for choosing lactobacillus in the following transfer experiments?

      We appreciate your thorough review. Among the Gram-positive bacteria that we have sequenced and analyzed, Lactobacillus occupies the largest proportion. Given the significant presence and established benefits of Lactobacillus, we chose it for the subsequent transfer experiments to leverage its known properties and availability, thereby ensuring the robustness and reproducibility of our findings.This is supported by the study referenced below.

      Lamas B, Richard ML, Leducq V, Pham HP, Michel ML, Da Costa G, Bridonneau C, Jegou S, Hoffmann TW, Natividad JM, Brot L, Taleb S, Couturier-Maillard A, Nion-Larmurier I, Merabtene F, Seksik P, Bourrier A, Cosnes J, Ryffel B, Beaugerie L, Launay JM, Langella P, Xavier RJ, Sokol H. CARD9 impacts colitis by altering gut microbiota metabolism of tryptophan into aryl hydrocarbon receptor ligands. Nat Med. 2016 Jun;22(6):598-605. doi: 10.1038/nm.4102. Epub 2016 May 9. PMID: 27158904; PMCID: PMC5087285.

      (4) FDAA-labeled E. faecalis colonization is decreased in the knockouts. Is it specific for E. faecalis, or it is generally true for all gram-positive bacteria? What about the colonization of gram-negative bacteria?

      Thank you for your insightful suggestions and we have investigated the colonization of gram-negative bacteria, OP50-mcherry (a strain of E.coli that express mCherry) and included the results in the updated manuscript (Supplementary Figure 3B, page 5, Line 197-200). We performed rectal injection of both wildtype and Chi11-/- mice with mCherry-OP50, and found that Chi11-/- mice had much higher colonization of E. coli compared to wildtype mice.

      (5) In Figure 5, the fact that FMT did not completely rescue the phenotype may point to the role of host cells in the processes. The reason that lactobacillus transfer did completely rescue the phenotypes could be due to the overwhelming protective role of lactobacillus itself, as the experiments were missing villin-cre mice transferred with lactobacillus.

      Thank you for your valuable feedback and thorough review. In our study, pretreatment with antibiotics in mice to eliminate gut microbiota demonstrated that IEC∆Chil1 mice exhibited a milder colitis phenotype (Supplementary Figure 4). This suggests that Chi3l1-expressing host cells are likely to play a detrimental role in colitis. Consequently, the failure of FMT to completely rescue the phenotype is likely due to the incomplete preservation of bacteria in the feces during the transfer experiment.

      We agree with your assessment of the protective role of lactobacillus. This also explains the significant difference in colitis phenotype between Villin-cre and IEC∆Chil1 mice (Figure 5B-E), as lactobacillus levels are significantly lower in IEC∆Chil1 mice (Figure 4F). Given the severity of colitis in Villin-cre mice at 7 days post-DSS, even if lactobacillus were transferred back to these mice, it is unlikely to result in a significant improvement.

      (6) Conflicting literature demonstrating the detrimental roles of Chi3l1 in mouse IBD model needs to be acknowledged and discussed.

      Thank you for your insightful suggestions and we have included additional discussions in the revised manuscript (page 6-7, Line 258-274).

      Reviewer #2 (Public Review):

      (1) Images are of great quality but lack proper quantification and statistical analysis. Statements such as "substantial increase of Chi3l1 expression in SPF mice" (Fig.1A), "reduced levels of Firmicutes in the colon lumen of IEC ∆ Chil1" (Fig.3F), "Chil1-/- had much lower colonization of E.faecalis" (Fig.4G), or "deletion of Chi3l1 significantly reduced mucus layer thickness" (Supplemental Figure 3A-B) are subjective. Since many conclusions were based on imaging data, the authors must provide reliable measures for comparison between conditions, as long as possible, such as fluorescence intensity, area, density, etc, as well as plots and statistical analysis.

      Thank you for your insightful suggestions and we have performed the suggested statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C).Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 4G. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent between the two groups.

      (2) In the fecal/Lactobacillus transplantation experiments, oral gavage of Lactobacillus to IECChil1 mice ameliorated the colitis phenotype, by preventing colon length reduction, weight loss, and colon inflammation. These findings seem to go against the notion that Chi3l1 is necessary for the colonization of Lactobacillus in the intestinal mucosa. The authors could speculate on how Lactobacillus administration is still beneficial in the absence of Chi3l1. Perhaps, additional data showing the localization of the orally administered bacteria in the gut of Chi3l1 deficient mice would clarify whether Lactobacillus are more successfully colonizing other regions of the gut, but not the mucus layer. Alternatively, later time points of 2% DSS challenge, after Lactobacillus transplantation, would suggest whether the gut colonization by Lactobacillus and therefore the milder colitis phenotype, is sustained for longer periods in the absence of Chi3l1.

      Thank you for your thorough review and insightful suggestions. Since we pretreated mice with antibiotics, the intestinal mucus layer is likely damaged according to a previous study (PMID: 37097253). Therefore, gavaged Lactobacillus cannot colonize in the mucus layer. Moreover, existing studies have shown that the protective effect of Lactobacillus is mainly derived from its metabolites or thallus components, rather than the living bacteria itself (PMID: 36419205, PMID: 27516254).

      Zhan M, Liang X, Chen J, Yang X, Han Y, Zhao C, Xiao J, Cao Y, Xiao H, Song M. Dietary 5-demethylnobiletin prevents antibiotic-associated dysbiosis of gut microbiota and damage to the colonic barrier. Food Funct. 2023 May 11;14(9):4414-4429. doi: 10.1039/d3fo00516j. PMID: 37097253.

      Montgomery TL, Eckstrom K, Lile KH, Caldwell S, Heney ER, Lahue KG, D'Alessandro A, Wargo MJ, Krementsov DN. Lactobacillus reuteri tryptophan metabolism promotes host susceptibility to CNS autoimmunity. Microbiome. 2022 Nov 23;10(1):198. doi: 10.1186/s40168-022-01408-7. PMID: 36419205.

      Piermaría J, Bengoechea C, Abraham AG, Guerrero A. Shear and extensional properties of kefiran. Carbohydr Polym. 2016 Nov 5;152:97-104. doi: 10.1016/j.carbpol.2016.06.067. Epub 2016 Jun 23. PMID: 27516254.

      Reviewer #3 (Public Review):

      The claim that mucus-associated Ch3l1 controls colonization of beneficial Gram-positive species within the mucus is not conclusive. The study should take into account recent discoveries on the nature of mucus in the colon, namely its mobile fecal association and complex structure based on two distinct mucus barrier layers coming from proximal and distal parts of the colon (PMID: ). This impacts the interpretation of how and where Ch3l1 is expressed and gets into the mucus to promote colonization. It also impacts their conclusions because the authors compare fecal vs. tissue mucus, but most of the mucus would be attached to the feces. Of the mucus that was claimed to be isolated from the WT and IEC Ch3l1 KO, this was not biochemically verified. Such verification (e.g. through Western blot) would increase confidence in the data presented. Further, the study relies upon relative microbial profiling, which can mask absolute numbers, making the claim of reduced overall Gram-positive species in mice lacking Ch3l1 unproven. It would be beneficial to show more quantitative approaches (e.g. Quantitative Microbial Profiling, QMP) to provide more definitive conclusions on the impact of Ch3l1 loss on Gram+ microbes.

      You raise an excellent point about the data interpretation, and we appreciate your insightful suggestions. We have included the discussion regarding the recent discoveries in the revised manuscript (page 7-8, Line 304-312). According to the recent discovery, the mucus in the proximal colon forms a primary encapsulation barrier around fecal material, while the mucus in the distal colon forms a secondary barrier. Our findings indicate that Chi3l1 is expressed throughout the entire colon, including the proximal, middle, and distal sections (See Author response image 1 below, P.S. Chi3l1 detection in colon presented in the manuscript are from the middle section). This suggests that Chi3l1 likely promotes bacterial colonization across the entire colon. Despite most mucus being expelled with feces, the

      constant production of mucus and the minimal presence of Chi3l1 in feces (Figure 4C) indicate that Chi3l1 continuously plays a role in promoting the colonization of microbiota.

      Author response image 1.

      Chi3l1 express in the proximal and distal colon. Immunofluoresence staining on proximal and distal colon sections to detect Chi3l1 (Red) expression. Nuclei were detected with DAPI (blue). Scale bars, 50um.

      Given the isolation method of the mucus layer, we followed the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Although we did not find a suitable marker representative of the mucus layer for western blotting, we performed protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in established studies (see Author response image 2 below).

      Author response image 2.

      Comparison of mucus layer proteins identified by mass spectrometry between Our team and the Hansson team Mucus layer proteins identified by mass spectrometry between our team and the Hansson team (PMID: 19432394) are compared.

      Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). Therefore, our data is reliable to some extent.

      Other weaknesses lie in the execution of the aims, leaving many claims incompletely substantiated. For example, much of the imaging data is challenging for the reader to interpret due to it being unfocused, too low of magnification, not including the correct control, and not comparing the same regions of tissues among different in vivo study groups. Statistical rigor could be better demonstrated, particularly when making claims based on imaging data. These are often presented as single images without any statistics (i.e. analysis of multiple images and biological replicates). These images include the LTA signal differences, FISH images, Enterococcus colonization, and mucus thickness.

      Thank you for your thorough review and insightful suggestions. We have performed the recommended statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C). We have also added arrows in Figure 2B to make the figure easier to understand. Additionally, we repeated some key experiments to show the same regions of tissues among different groups. We will upload higher resolution figures during the revision. Thank you again for your constructive feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in Figure 1.

      Thank you for your insightful suggestion. We have also made the suggested modifications to the labeling (revised manuscript, Figure 1A).

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written, but it would benefit from a critical reading to correct some typos and small grammar issues. Histological and IF images would be more informative if they contained arrows and labels guiding the reader's attention to what the authors want to show. More details about the structures shown in the figures should be included in the legends.

      Thank you for your thorough review and insightful suggestions. We have revised the manuscript to correct noticeable typos and grammar issues. Arrows have been added to Figure 2A&B to make the figures easier to understand. Additionally, we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend ( revised manuscript, page 19, line 730-733).

      Minor points:

      • Page 1, line 36: Please correct "mice models" to "mouse models".

      Thank you for your insightful suggestion and we have made the suggested correction in the revised manuscript (page 1, line 41).

      • Page 3, line 110: "by comparing the structure of chitin with that of peptidoglycan (PGN), a component of bacterial cells walls, we observed that they have similar structures (Fig.2A)". Although both structures are shown side-by-side, no similarities are mentioned or highlighted in the text, figure, or legend.

      Thank you for your insightful suggestion and we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend (revised manuscript, page 19, line 730-733).

      • Fig.5C and Fig.5G: y axis brings "weight (%)". I believe the authors mean "weight change (%)"?

      We agrees with your suggestion and has corrected the labeling according to your suggestion (revised manuscript, Figure 5C and G)

      • Page 8: Genotyping method is described as a protocol. Please modify it.

      Thank you for your constructive suggestion and we have modified the genotyping method in the revised manuscript (page 8, line 339-349)

      • Please expand on the term "scaffold model" used in the abstract and discussion.

      Thank you for your thorough review. In this model, Chi3l1 acts as a key component of the scaffold. By binding to bacterial cell wall components like peptidoglycan, Chi3l1 helps anchor and organize bacteria within the mucus layer. This interaction facilitates the colonization of beneficial bacteria such as Lactobacillus, which are important for gut health. We included more descriptions regarding scaffold model in the revised manuscript (page 6, line 248-250)

      • Discussion session often recapitulates results description, which makes the text repetitive.

      Thank you for your constructive suggestion and we have removed unnecessary results description in the discussion session in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Major comments

      (1) Figure 1A. The staining is very faint, and hard to see. The reader cannot be certain those are Ch311-positive cells. Higher Mag is needed.

      Thank you for your insightful suggestion and we have included the higher resolution figures in the revised manuscript Figure 1A.

      (2) The mucus is produced largely by the proximal colon, is adherent to the feces, and mobile with the feces (PMID: 33093110). Therefore it is important to determine where the Ch311 is being expressed to be released into the lumen. Further Ch3l1 expression studies are needed to be done in both proximal and distal colon.

      Thank you for your thorough review and insightful suggestions. We have addressed this part in our public review. Additionally, we agree with your suggestions and will conduct further studies on Chi3l1 expression in both the proximal and distal colon.

      (3) Figure 1B. The image is out of focus for the Ileum, and the DAPI signal needs to be brought up for the colon. Which part of the colon is this? The UEA1+ cells do not really look like goblet cells. A better image with clearer goblet cells is needed.

      Thank you for your constructive suggestions. In the revised manuscript, we have included higher-resolution images (Figure 1B). The middle colon (approximately 3 to 4 cm distal from the cecum) was harvested for staining. In addition to UEA-1, we utilized anti-MUC2 antibody to label goblet cells in this colon segment (see Author response image 3 below). The patterns of goblet cells identified by UEA-1 or MUC2 antibodies are similar. The UEA-1-positive cells shown in Figure 1B are presumed to be goblet cells.

      Author response image 3.

      Goblet Cell Distribution in the Middle Colon. Goblet cells in the middle segment of the colon (approximately 3 to 4 cm distal from the cecum) were detected using immunofluorescence with antibodies against UEA-1 (green) and MUC2 (red). Scale bar=50μm. Representative images are shown from three mice individually stained for each antibody.

      (4) Figure 1G. There needs to be some counterstain or contrast imaging to show evidence that cells are present in the untreated sample.

      Thank you for your insightful suggestions. We have annotated the cells present in the untreated sample based on the overexposure in the revised manuscript (Figure 1G).

      (5) Figure 3B. Is this absolute quantification? How were the data normalized to allow comparison of microbial loads?

      Thank you for your thorough review. Figure 3B presents absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we amplified a short segment (179 bp) of the 16S rRNA gene using conserved 16S rRNA-specific primers and OP50 (a strain of E. coli) as the template. After gel extraction and concentration measurement, the PCR products were diluted to gradient concentrations (0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48 pg/µl). These gradient concentrations were used as templates for qPCR to generate a standard curve based on Ct values and bacterial concentration. The standard curve is used to calculate bacterial concentration in the samples. The data presented in Figure 3B represent the weight of bacteria/milligram sample, calculated as (bacterial concentration x bacterial volume) / (weight of feces or gut content).

      (6) Figure 3D. The major case is made for a dramatic reduction in Gram+ species, but Figure 1D does not show a dramatic change. Is this difference significant?

      Thank you for your thorough review. We don’t think we are clear about your question. However, there was no significant difference in Figure 3D. The dramatic reduction in Gram+ species are made based on the LTA, Firmicutes FISH, individual species comparison between WT and KO mice, bacterial QPCR results together (Figure 3E-H).

      (7) Figures 3E and 3F. These stainings are alone not convincing of reduced Gram+ in the KOs. Some stats are required for these images. An independent complementary method is also needed to quantify these with statistics since this data is so central to the study's conclusions.

      Thank you for your constructive suggestions. We have included statistical analysis in the revised manuscript (Figure 3E and F). Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 3E. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent with the Firmicutes FISH results. Complementary method such as bacterial QPCR have been employed to quantify these (Figure 4E, F). Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments.

      (8) Figure 3G. To make quantitative conclusions, the authors need to do quantitative microbial profiling (QMP) of the microbiota. Relative abundance masks absolute numbers, which could be increased. There are qPCR-based QMP platforms the authors could use (PMID: PMIDs: 31940382, 33763385).

      Thank you for your constructive suggestions. Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). In addition to the original bacterial qPCR data presented in the manuscript, we included another bacterial species, Turicibater. Consistent with the 16S rRNA sequencing analysis data, qPCR results showed that Turicibacter was more abundant in IECΔChil1 mice than Villin-cre mice (revised manuscript, supplementary Figure 3A, page 4, line 171-173) Therefore, our data is reliable to some extent.

      (9) Figure 4B. The data nicely shows Ch3l1 in mucus. However, no data supports the authors' main claim Ch3h1 binds Gram-positive bacteria in situ. Dual staining of Ch3l1 with Firmicutes probe would be supportive to show this interaction is happening in vivo.

      You raise an excellent point, and we agree with your suggestion that we should confirm Chi3l1 binding to Gram-positive bacteria in situ. During the study, we attempted dual staining of Chi3l1 with a universal bacterial 16S FISH probe several times, but we were unsuccessful. Despite various optimizations of the protocol, we were only able to detect bacteria, not Chi3l1. It appears that the antibody is not suitable for this method.

      (10) Figures 4D - F. Because mucus is associated with feces (PMID: ), the data with feces likely contains both Muc2/mucus and Feces. Therefore, it is unclear what the "mucus" is referring to in these figures. To support the authors' conclusions, there needs to be some validation that mucus was purified in the assays. This must be confirmed at a minimum by PAS staining on SDS PAGE gel (should be very high molecular weight) or Western blot with UEA lectin.

      Thank you for your insightful suggestions. As mentioned in the public review, the mucus layer was isolated following the protocol described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, after harvesting the middle colon from the mice, we cut open the colon longitudinally. After removing the gut contents, the lumen was vigorously rinsed in PBS while holding one end with forceps. The pellet obtained after centrifuging the rinsate was used as our mucus sample. Fresh feces were collected immediately after the mice defecated in a new, empty cage. We performed Western blot analysis to detect UEA lectin but were unsuccessful.

      However, as noted in the public review, we conducted protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in these established studies.

      (11) Figure 4E/F: The units of measurement are in pg/cm2, implying picogram per area. Can the authors please explain what this unit is referring to?

      We are grateful for your thorough review. The unit pg/cm ² represents picograms per square centimeter. Figures 4E and 4F present absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we harvested a 3x0.5 cm section of colon and a 9x0.4 cm section of ileum. And then we collected the mucus layer as previously described (responses to question 10). We measured bacterial concentration as described in response to question 5 using the equation (y = -1.53ln(x) + 13.581), where x represents the bacterial concentration and y represents the Ct value. After obtaining the bacterial concentration, we multiplied it by the volume of the rinsate and divided it by the area to obtain the values for pg/cm² used in the figures.

      (12) Figure 5E. Normal tissues appear to be from different colon regions from colitis tissues: the "Normal" looks like the proximal colon, while "Colitis" looks like the Distal colon. They cannot be directly compared.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5E to compare the same region of the colons.

      (13) Similarly, in Figure 5I it appears different colon regions are being compared between groups: Proximal colon in the bottom panels, and distal in the top panels. Since the proximal colon is less damaged by DSS, this data could be misleading.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5I to compare the same region of the colons.

      (14) In the DSS studies, are the VillinCre and IEC Chit3l1 mice co-housed littermates?

      Thank you for your insightful suggestion. In the DSS studies, the Villin-Cre and IECΔChil1 mice are not co-housed littermates. However, they are derived from the same lineage and are housed in the same rack within the same room of the animal facility.

      (15) Supplementary Figure 3: Mucus thickness images; are they representative? Stats are needed on multiple mice to support the claim that the mucus is thinner.

      Thank you for your insightful suggestion. The images are representative of 4 mice each group. We have now included the statistical analysis in the revised manuscript Supplementary Figure 3C&D.

      Minor

      (1) Introduction: Reference to "mucosal layer": "Mucosal" and "Mucus" are different things. "Mucosal" refers to the epithelium, lamina propria, and muscularis mucosa. "Mucus" refers to the secreted mucus gel, the focus of the authors' study. Therefore, the statement "mucosal layer" is not proper. "Mucosal layer" should be changed to "mucus layer."

      Thank you for your constructive suggestions and we have learned a lot from it. We have made the replacement of “mucosal layer” to “mucus layer in the revised manuscript.

      (2) Line 366 and related lines: Feces cannot be "dissolved". "Resuspended" is a better term.

      Thank you for your constructive suggestion and we have made the changes of “dissolved” to “resuspended” in the revised manuscript.

      (3) Lines 36-37 and 43-44 are redundant to each other.

      Thank you for your constructive suggestion and we have removed the lines 36-37 in the revised manuscript.

    1. eLife assessment

      This study provides useful evidence substantiating a role for long noncoding RNAs in liver metabolism and organismal physiology. Using murine knockout and knock-in models, the authors invoke a previously unidentified role for the lncRNA Snhg3 in fatty liver. The revised manuscript has improved and most studies are backed by solid evidence but the study was found to be incomplete and will require future studies to substantiate some of the claims.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors investigate the contributions of the long noncoding RNA snhg3 in liver metabolism and MAFLD. The authors conclude that liver-specific loss or overexpression of Snhg3 impacts hepatic lipid content and obesity through epigenetic mechanisms. More specifically, the authors invoke that nuclear activity of Snhg3 aggravates hepatic steatosis by altering the balance of activating and repressive chromatin marks at the Pparg gene locus. This regulatory circuit is dependent on a transcriptional regulator SNG1.

      Strengths:

      The authors developed a tissue specific lncRNA knockout and KI models. This effort is certainly appreciated as few lncRNA knockouts have been generated in the context of metabolism. Furthermore, lncRNA effects can be compensated in a whole organism or show subtle effects in acute versus chronic perturbation, rendering the focus on in vivo function important and highly relevant. In addition, Snhg3 was identified through a screening strategy and as a general rule the authors the authors attempt to follow unbiased approaches to decipher the mechanisms of Snhg3.

    3. Reviewer #2 (Public Review):

      Through RNA analysis, Xie et al found LncRNA Snhg3 was one of the most down-regulated Snhgs by high fat diet (HFD) in mouse liver. Consequently, the authors sought to examine the mechanism through which Snhg3 is involved in the progression of metabolic dysfunction-associated fatty liver diseases (MASLD) in HFD-induced obese (DIO) mice. Interestingly, liver-specific Sngh3 knockout reduced, while Sngh3 over-expression potentiated fatty liver in mice on a HFD. Using the RNA pull-down approach, the authors identified SND1 as a potential Sngh3 interacting protein. SND1 is a component of the RNA-induced silencing complex (RISC). The authors found that Sngh3 increased SND1 ubiquitination to enhance SND1 protein stability, which then reduced the level of repressive chromatin H3K27me3 on PPARg promoter. The upregulation of PPARg, a lipogenic transcription factor, thus contributed to hepatic fat accumulation.

      The authors propose a signaling cascade that explains how LncRNA sngh3 may promote hepatic steatosis. Multiple molecular approaches have been employed to identify molecular targets of the proposed mechanism, which is a strength of the study.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors investigate the contributions of the long noncoding RNA snhg3 in liver metabolism and MAFLD. The authors conclude that liver-specific loss or overexpression of Snhg3 impacts hepatic lipid content and obesity through epigenetic mechanisms. More specifically, the authors invoke that nuclear activity of Snhg3 aggravates hepatic steatosis by altering the balance of activating and repressive chromatin marks at the Pparg gene locus. This regulatory circuit is dependent on a transcriptional regulator SNG1.

      Strengths:

      The authors developed a tissue specific lncRNA knockout and KI models. This effort is certainly appreciated as few lncRNA knockouts have been generated in the context of metabolism. Furthermore, lncRNA effects can be compensated in a whole organism or show subtle effects in acute versus chronic perturbation, rendering the focus on in vivo function important and highly relevant. In addition, Snhg3 was identified through a screening strategy and as a general rule the authors the authors attempt to follow unbiased approaches to decipher the mechanisms of Snhg3.

      Weaknesses:

      Despite efforts at generating a liver-specific knockout, the phenotypic characterization is not focused on the key readouts. Notably missing are rigorous lipid flux studies and targeted gene expression/protein measurement that would underpin why loss of Snhg3 protects from lipid accumulation. Along those lines, claims linking the Snhg3 to MAFLD would be better supported with careful interrogation of markers of fibrosis and advanced liver disease. In other areas, significance is limited since the presented data is either not clear or rigorous enough. Finally, there is an important conceptual limitation to the work since PPARG is not established to play a major role in the liver.

      We thank the reviewer for the nice comment. As the reviewer comment, the manuscript still exists some shortcomings, we added partial shortcomings in the section of Discussion, please check them in the third paragraph on p17 and the first paragraph on p18.

      We agree the reviewer comment, there are still conflicting conclusions about the role of PPARγ in MASLD. We had discussed it in the section of Discussion, please check them in the first paragraph on p13.

      Reviewer #2 (Public Review):

      Through RNA analysis, Xie et al found LncRNA Snhg3 was one of the most down-regulated Snhgs by high fat diet (HFD) in mouse liver. Consequently, the authors sought to examine the mechanism through which Snhg3 is involved in the progression of metabolic dysfunction-associated fatty liver diseases (MASLD) in HFD-induced obese (DIO) mice. Interestingly, liver-specific Sngh3 knockout reduced, while Sngh3 over-expression potentiated fatty liver in mice on a HFD. Using the RNA pull-down approach, the authors identified SND1 as a potential Sngh3 interacting protein. SND1 is a component of the RNA-induced silencing complex (RISC). The authors found that Sngh3 increased SND1 ubiquitination to enhance SND1 protein stability, which then reduced the level of repressive chromatin H3K27me3 on PPARg promoter. The upregulation of PPARg, a lipogenic transcription factor, thus contributed to hepatic fat accumulation.

      The authors propose a signaling cascade that explains how LncRNA sngh3 may promote hepatic steatosis. Multiple molecular approaches have been employed to identify molecular targets of the proposed mechanism, which is a strength of the study. There are, however, several potential issues to consider before jumping to the conclusion.

      (1) First of all, it's important to ensure the robustness and rigor of each study. The manuscript was not carefully put together. The image qualities for several figures were poor, making it difficult for the readers to evaluate the results with confidence. The biological replicates and numbers of experimental repeats for cell-based assays were not described. When possible, the entire immunoblot imaging used for quantification should be presented (rather than showing n=1 representative). There were multiple mis-labels in figure panels or figure legends (e.g., Fig. 2I, Fig. 2K and Fig. 3K). The b-actin immunoblot image was reused in Fig. 4J, Fig. 5G and Fig. 7B with different exposure times. These might be from the same cohort of mice. If the immunoblots were run at different times, the loading control should be included on the same blot as well.

      We thank the reviewer for the detailed comment. We have provided the clear figures in revised manuscript, please check them.

      The biological replicates and numbers of experimental repeats for cell-based assays had been updated and please check them in the manuscript.

      The entire immunoblot imaging used for quantification had been provided in the primary data. Please check them.

      The original Figure 2I, Figure 2K, Figure 3K have been revised and replaced with new Figure 2F, 2H, 3H, and their corresponding figure legends has also been corrected in revised manuscript.

      The protein levels of CD36, PPARγ and β-ACTIN were examined at the same time and we had revised the manuscript, please check them in revised Figure 7B and C.

      (2) The authors can do a better job in explaining the logic for how they came up with the potential function of each component of the signaling cascade. Sngh3 is down-regulated by HFD. However, the evidence presented indicates its involvement in promoting steatosis. In Fig. 1C, one would expect PPARg expression to be up-regulated (when Sngh3 was down-regulated). If so, the physiological observation conflicts with the proposed mechanism. In addition, SND1 is known to regulate RNA/miRNA processing. How do the authors rule out this potential mechanism? How about the hosting snoRNA, Snord17? Does it involve in the progression of NASLD?

      We thank the reviewer for the detailed comment. In this study, although the expression of Snhg3 was decreased in DIO mice, Snhg3 deficiency decreased the expression of hepatic PPARγ and alleviated hepatic steatosis in DIO mice, and Snhg3 overexpression induced the opposite effect, which led us to speculate that the downregulation of Snhg3 in DIO mice might be a stress protective reaction to high nutritional state, but the specific details need to be clarified. This is probably similar to FGF21 and GDF15, whose endogenous expression and circulating levels are elevated in obese humans and mice despite their beneficial effects on obesity and related metabolic complications (Keipert and Ost, 2021). We had added the content in the Discussion section, please check it in the second paragraph on p12.

      SND1 has multiple roles through associating with different types of RNA molecules, including mRNA, miRNA, circRNA, dsRNA and lncRNA. We agree with the reviewer good suggestion, the potential mechanism of SND1/lncRNA-Snhg3 involved in hepatic lipid metabolism needs to be further investigated. We also discussed the limitation in the manuscript and please refer the section of Discussion in the third paragraph on p17.

      Snhg3 serves as host gene for producing intronic U17 snoRNAs, the H/ACA snoRNA. A previous study found that cholesterol trafficking phenotype was not due to reduced Snhg3 expression, but rather to haploinsufficiency of U17 snoRNA (Jinn et al., 2015). Additionally, knockdown of U17 snoRNA in vivo protected against hepatic steatosis and lipid-induced oxidative stress and inflammation (Sletten et al., 2021). In this study, the expression of U17 snoRNA decreased in the liver of DIO Snhg3-HKO mice and remain unchanged in the liver of DIO Snhg3-HKI mice, but overexpression of U17 snoRNA had no effect on the expression of SND1 and PPARγ (figure supplement 5A-C), indicating that Sngh3 induced hepatic steatosis was independent on U17 snoRNA. We had discussed it in revised manuscript, please refer to p15 of the Discussion section.

      References

      JINN, S., BRANDIS, K. A., REN, A., CHACKO, A., DUDLEY-RUCKER, N., GALE, S. E., SIDHU, R., FUJIWARA, H., JIANG, H., OLSEN, B. N., SCHAFFER, J. E. & ORY, D. S. 2015. snoRNA U17 regulates cellular cholesterol trafficking. Cell Metab, 21, 855-67. DIO:10.1016/j.cmet.2015.04.010, PMID:25980348

      KEIPERT, S. & OST, M. 2021. Stress-induced FGF21 and GDF15 in obesity and obesity resistance. Trends Endocrinol Metab, 32, 904-915. DIO:10.1016/j.tem.2021.08.008, PMID:34526227

      SLETTEN, A. C., DAVIDSON, J. W., YAGABASAN, B., MOORES, S., SCHWAIGER-HABER, M., FUJIWARA, H., GALE, S., JIANG, X., SIDHU, R., GELMAN, S. J., ZHAO, S., PATTI, G. J., ORY, D. S. & SCHAFFER, J. E. 2021. Loss of SNORA73 reprograms cellular metabolism and protects against steatohepatitis. Nat Commun, 12, 5214. DIO:10.1038/s41467-021-25457-y, PMID:34471131

      (3) The role of PPARg in fatty liver diseases might be a rodent-specific phenomenon. PPARg agonist treatment in humans may actually reduce ectopic fat deposition by increasing fat storage in adipose tissues. The relevance of the finding to human diseases should be discussed.

      We thank the reviewer for the detailed comment. We agree the reviewer comment, there are still conflicting conclusions about the role of PPARγ in MASLD. We had discussed it in the section of Discussion, please check them in the first paragraph on p13.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not have further recommendations beyond what I mentioned in the original review. The authors have not adequately addressed all the issues but the manuscript has improved and the overall strength of evidence is now solid from incomplete.

      We appreciate positive feedback from the reviewer. While we acknowledge that the updated manuscript has significantly improved, we recognize that it remains incomplete and additional details regarding Snhg3 will be warranted in our future studies. Moreover, we have discussed those potential weakness in the section of Discussion (please refer in the third paragraph on p17 and the first paragraph on p18).

      Reviewer #2 (Recommendations For The Authors):

      The authors have provided explanations and some new data to clarify the comments from the first submission. They have also included the original immunoblots for all the experimental repeats. The CHX protein stability results shown in Fig. 5J were not consistent between experiments, perhaps because the difference was subtle. The results on PPARg protein expression were not clearcut. The inclusion of a PPARg knockdown control would be helpful to validate the specificity of the antibody. Of note, the immunoblots used for Fig. 5I (PA treated) repeats 2, 4 and 1 were identical to those of Fig. 7F repeats 3, 1 and 5. The authors should provide an explanation for the potential issue.

      We thank the further comments and suggestions from the reviewer. We agree with the reviewer comment about Snhg3-mediated SND1 protein stability. In this study, Snhg3 promoted the protein, not mRNA, level of SND1, but Snhg3 subtly increased the SND1 protein stability. We revised the description in the manuscript, “Meanwhile, Snhg3 regulated the protein, not mRNA, expression of SND1 in vivo and in vitro by mildly promoting the stability of SND1 protein (Figures 5G-I).” This revision can be found in the second paragraph on p9. While our findings indicated that Snhg3 can influence SND1 expression at the protein level, we acknowledge the possibility of additional mechanisms contributing to this complex regulatory network. Therefore, further investigation is necessary to clarify whether Snhg3 regulates SND1 protein expression through other potential mechanisms. In light of this, we have added it in the Discussion section. Please refer to the second paragraph on p16.

      In this study, the protein level of PPARγ (molecular weight ~57 kDa) was detected using anti-PPARγ antibody (Abclonal, Cat. A11183), which has been used to determine PPARγ protein expression in 13 published papers as showed in the ABclonal Technology Co., Ltd. (https://abclonal.com.cn/catalog/A11183). And the specificity of this antibody has been validated in Zhang’s study by PPARγ knockdown (Zhang et al., 2019). In our study, hepatic PPARγ protein sometimes showed two bands (~ 57kDa and > 75kDa) using this antibody. It is well established that the PPARγ gene encodes two protein isoforms (PPARγ1, a 477 amino acid protein, and PPARγ2, a 505 amino acid protein) via differential promoter usage and alternative splicing (Gene: Pparg (ENSMUSG00000000440) - Transcript comparison - Mus_musculus - Ensembl genome browser 112) (Hernandez-Quiles et al., 2021). The molecular weight difference between PPARγ1 and PPARγ2 is about 3kd. Therefore, we consider that the band shown larger than 75kd in our study is likely nonspecific. In line with the reviewer’s suggestion, the antibody’s specificity could be further validated by knockdown or knockout of PPARγ in the future.

      We thank the reviewer for the detailed comment. In this study, we tested the effect of Snhg3 overexpression on SND1 protein level and the effect of Snhg3 or Snd1 overexpression on PPARγ protein level in Hepa1-6 cells by transfecting with Snhg3, SND1 and the control, respectively. The results indicated that overexpression of Snhg3 promoted the protein levels of SND1 and PPARγ, and overexpression of SND1 also induced the protein level of PPARγ. Considering scholarly and professional thinking and writing, we firstly showed that overexpression of Snhg3 promoted the protein level of SND1 in Figure 5I, followed by demonstrating that the overexpression of Snhg3 or SND1 elicited PPARγ expression in Figures 7F. However, we acknowledge that this order of presentation may cause confusion. In fact, these experiments were repeatedly performed by multiple times, and we have provided the new original western blot data and analysis for Figure 5I (PA treatment) for further clarification. Please check them.

      References

      HERNANDEZ-QUILES, M., BROEKEMA, M. F. & KALKHOVEN, E. 2021. PPARgamma in Metabolism, Immunity, and Cancer: Unified and Diverse Mechanisms of Action. Front Endocrinol (Lausanne), 12, 624112. DIO:10.3389/fendo.2021.624112, PMID:33716977

      ZHANG, Z., ZHAO, G., LIU, L., HE, J., DARWAZEH, R., LIU, H., CHEN, H., ZHOU, C., GUO, Z. & SUN, X. 2019. Bexarotene Exerts Protective Effects Through Modulation of the Cerebral Vascular Smooth Muscle Cell Phenotypic Transformation by Regulating PPARgamma/FLAP/LTB(4) After Subarachnoid Hemorrhage in Rats. Cell Transplant, 28, 1161-1172. DIO:10.1177/0963689719842161, PMID:31010302

    1. eLife assessment

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors provide convincing data to demonstrate that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The findings are valuable as they contribute to our understanding of the molecular mechanisms underlying the interaction between gliding motility and the bacterial cell wall.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript nicely outlines a conceptual problem with the bFAC model in A-motility, namely, how the energy derived from the inner membrane AglRQS motor transduced through the cell wall into mechanical force on the cell surface to drive motility? To address this, the authors make a significant contribution by identifying and characterizing a lytic transglycosylase (LTG) called AgmT. This work thus provides clues and a future framework work to address mechanical force transmission from the cytoplasm through the cell envelope to the cell surface.

      Strengths:

      (i) Convincing evidence shows AgmT functions as a LTG and, surprisingly, that mltG from E. coli complements the swarming defect of an agmT mutant.

      (ii) Show 13 other LTGs found in M. xanthus are not required for A-motility.

      (iii) Authors show agmT mutants develop morphological changes in response to treatment with a beta-lactam antibiotic, mecillinam.

      (iv) The use of single molecule tracking to monitor the assembly and dynamics of bFACs in WT and mutant backgrounds.

      (v) The authors understand the limitations of their work and do not overinterpret their data.

      Weaknesses:

      The authors provided more experiments and clearly addressed my prior concerns in their revised manuscript.

    3. Reviewer #2 (Public review):

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors conclusively show that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The data are generally well-controlled. The marked strength of the manuscript includes the detailed characterization of AgmT as a cell wall lytic enzyme, and the careful dissection of its role in motility. Using multiple lines of evidence, the authors conclusively show that AgmT does not directly associate with the motility complexes, but that instead its absence (or the overexpression of its active site mutant) results in failure of focal adhesion complexes to properly interact with the cell wall.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript nicely outlines a conceptual problem with the bFAC model in A-motility, namely, how is the energy produced by the inner membrane AglRQS motor transduced through the cell wall into mechanical force on the cell surface to drive motility? To address this, the authors make a significant contribution by identifying and characterizing a lytic transglycosylase (LTG) called AgmT. This work thus provides clues and a future framework work for addressing mechanical force transmission between the cytoplasm and the cell surface. 

      Strengths: 

      (1) Convincing evidence shows AgmT functions as an LTG and, surprisingly, that mltG from E. coli complements the swarming defect of an agmT mutant. 

      (2) Authors show agmT mutants develop morphological changes in response to treatment with a b-lactam antibiotic, mecillinam. 

      (3) The use of single-molecule tracking to monitor the assembly and dynamics of bFACs in WT and mutant backgrounds. 

      (4) The authors understand the limitations of their work and do not overinterpret their data. 

      Weaknesses: 

      (1) A clear model of AgmT's role in gliding motility or interactions with other A-motility proteins is not provided. Instead, speculative roles for how AgmT enzymatic activity could facilitate bFAC function in A-motility are discussed. 

      We appreciate the reviewer for this comment. We have added a new figure, Fig. 6, and updated the Discussion to propose a mechanism, “rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      (2) Although agmT mutants do not swarm, in-depth phenotypic analysis is lacking. In particular, do individual agmT mutant cells move, as found with other swarming defective mutants, or are agmT mutants completely nonmotile, as are motor mutants? 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the gliding phenotype of the ΔagmT pilA mutant on the single cell level. We found that the ΔagmT pilA cells are not completely static. Instead, they move for less than half cell length before pauses and reversal. We moved on to quantify the velocity and gliding persistency and found that the gliding phenotype of the ΔagmT pilA cells matches the prediction on the bFACs that loses the connection between the inner subcomplexes and PG.  

      We then imaged individual ∆agmT pilA- cells on 1.5% agar surface at 10-s intervals using bright-field microscopy. To our surprise, instead of being static, individual ∆agmT pilA- cells displayed slow movements, with frequent pauses and reversals (Video 1). To quantify the effects of AgmT, we measured the velocity and gliding persistency (the distances cells traveled before pauses and reversals) of individual cells. Compared to the pilA- cells that moved at 2.30 ± 1.33 μm/min (n = 46) and high persistency (Video 2 and Fig. 2C, D), ∆agmT pilA- cells moved significantly slower (0.88 ± 0.62 μm/min, n = 59) and less persistent (Video 1 and Figure. 2C, D). Such aberrant gliding motility is distinct from the “hyper reversal” phenotype. Although the hyper reversing cells constitutively switching their moving directions, they usually maintain gliding velocity at the wild-type level27. due to the polarity regulators Instead, the slow and “slippery” gliding of the ∆agmT pilA- cells matches the prediction that when the inner complexes of bFACs lose connection with PG, bFACs can only generate short, and inefficient movements19. Our data indicate that AgmT is not essential component in the bFACs. Thus, AgmT is likely to regulate the assembly and stability of bFACs, especially their connection with PG.         

      (3) The bioinformatic and comparative genomics analysis of agmT is incomplete. For example, the sequence relationships between AgmT, MltG, and the 13 other LTG proteins in M. xanthus are not clear. Is E. coli MltG the closest homology to AgmT? Their relationships could be addressed with a phylogenetic tree and/or sequence alignments. Furthermore, are there other A-motility genes in proximity to agmT? Similarly, does agmT show specific co-occurrences with the other A-motility genes across genera/species?  

      We answered the first question in the Discussion (it was in the first Results section in the previous version), “Both M. xanthus AgmT and E. coli MltG belong to the YceG/MltG family, which is the first identified LTG family that is conserved in both Gram-negative and positive bacteria25,41. About 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25. The unique inner membrane localization of this family and the fact that AgmT is the only M. xanthus LTG that belongs to this family (Table S2) could partially explain why it is the only LTG that contributes to gliding motility”.

      For the second, we added one sentence in the Results, “No other motility-related genes are found in the vicinity of agmT”.

      For the third question, we do not believe a co-occurrence analysis is necessary. Because M. xanthus gliding is very unique but “about 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25”, gliding should show no co-occurrence with the YceG/MltG family LTGs.

      (4) Related to iii, what about the functional relationship of the endogenous 13 LTG genes? Although knockout mutants were shown to be motile, presumably because AgmT is present, can overexpression of them, similar to E. coli MltG, complement an agmT mutant? In other words, why does MltG complement and the endogenous LTG proteins appear not to be relevant? 

      We appreciate the reviewer for this question, which prompted us to think the uniqueness of AgmT more carefully. AgmT is unique for its inner-membrane localization, rather than activity. We answered this question in the discussion, “LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands”. We then moved on to propose a possible mechanism, “E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”. 

      (5) Based on Figure 2B, overexpression of MltG enhances A-motility compared to the parent strain and the agmT-PAmCh complemented strain, is this actually true? Showing expanded swarming colony phenotypes would help address this question. 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the effects of MltG expression at the single-cell level. We found that “Consistent with its LTG activity, the expression of MltGEc restored gliding motility of the ΔagmT pilA- cells on both the colony (Fig. 2B) and single-cell (Fig. 2C, D) levels. Interestingly, in the absence of sodium vanillate, the leakage expression of MltGEc using the vanillate-inducible promoter was sufficient to compensate the loss of AgmT. A plausible explanation of this observation is that as E. coli grows much faster (generation time 20 - 30 min) than M. xanthus (generation time ~4 h), MltGEc could possess significantly higher LTG activity than AgmT. Induced by 200 μM sodium vanillate, the expression of MltGEc further but non significantly increased the velocity and gliding persistency (Fig. 2B-D). Importantly, the expression of MltGEc failed to restore gliding motility in the agmTEAEA pilA cells, even in the presence of 200 μM sodium vanillate (Fig. 2B). Consistent with the mecillinam resistance assay (Fig. 3C), this result suggests that AgmTEAEA still binds to PG and that in the absence of its LTG activity, AgmT does not anchor bFACs to PG”. These results are shown in the new panels C and D in Figure 2. 

      (6) Cell flexibility is correlated with gliding motility function in M. xanthus. Since AgmT has LTG activity, are agmT mutants less flexible than WT cells and is this the cause of their motility defect? 

      We appreciate the reviewer for bringing up an important question. We saw cells that lack AgmT making S-turns and U-turns frequently under microscope. We used a GRABS assay to quantify cell stiffness and found that neither the absence of AgmT nor the expression of MltGEc affect cell stiffness. We added this result in the manuscript, “The assembly of bFACs produces wave-like deformation on cell surface6,37, suggesting that their assembly may require a flexible PG layer2,6,11,12. As a major contributor to cell stiffness, PG flexibility affects the overall stiffness of cells38. To test the possibility that AgmT and MltGEc facilitate bFAC assembly by reducing PG stiffness, we adopted the GRABS assay38 to quantify if the lack of AgmT and the expression of MltGEc affects cell stiffness. To quantify changes in cell stiffness, we simultaneously measured the growth of the pilA-, ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells in a 1% agarose gel infused with CYE and liquid CYE and calculated the GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- cells using the pilA- cells as the reference, where positive and negative GRABS scores indicate increased and decreased stiffness, respectively (see Materials and Methods and Ref38). The GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells were -0.06 ± 0.04 and -0.10 ± 0.07 (n = 4), respectively, indicating that neither AgmT nor MltGEc affects cell stiffness significantly. Whereas PG flexibility could still be essential for gliding, AgmT and MltGEc do not regulate bFAC assembly by modulating PG stiffness. Instead, these LTGs could connect bFACs to PG by generating structural features that are irrelevant to PG stiffness”.      

      Reviewer #2 (Public Review): 

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors conclusively show that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The data are generally well-controlled. The marked strength of the manuscript includes the detailed characterization of AgmT as a cell wall lytic enzyme, and the careful dissection of its role in motility. Using multiple lines of evidence, the authors conclusively show that AgmT does not directly associate with the motility complexes, but that instead its absence (or the overexpression of its active site mutant) results in the failure of focal adhesion complexes to properly interact with the cell wall. 

      An interpretive weakness is the rather direct role attributed to AgmT in focal adhesion assembly. While their data clearly show that AgmT is important, it is unclear whether this is the direct consequence of AgmT somehow promoting bFAC binding to PG or just an indirect consequence of changed cell wall architecture without AgmT. In E. coli, an MltG mutant has increased PG strain length, suggesting that M. xanthus's PG architecture may likewise be compromised in a way that precludes AglR binding to the cell wall. However, this distinction would be very difficult to establish experimentally. MltG has been shown to associate with active cell wall synthesis in E. coli in the absence of protein-protein interactions, and one could envision a similar model in M. xanthus, where active cell wall synthesis is required for focal adhesion assembly, and MltG makes an important contribution to this process. 

      Based on the data that AgmT does not assemble into bFACs and that heterologous MltGEc substitutes M. xanthus AgmT in gliding, we believe that AgmT facilitates the proper assembly of bFACs indirectly. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates proper bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue:  

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The last sentence of the Discussion implies that anchoring LTG (AgmT) in the inner membrane is important. I did not see this mentioned about AgmT. Does it contain an inner membrane anchoring domain? Along these lines, the AgmT and MltG proteins appear to be of different sizes (Figure 1A). Please clarify, perhaps including full-length sequence alignment and/or domain architecture for these proteins. 

      We revised the first paragraph in the Results and clarified, “Among these genes, agmT (ORF K1515_0491023) was predicted to encode an inner membrane protein with a single N-terminal transmembrane helix (residues 4 – 25) and a large “periplasmic solute-binding” domain22.”

      We appreciate the reviewer for spotting the mistake in Fig. 2A. The E. coli MltG sequence shown in the alignment starts from residue 158, instead of 88. We have corrected this mistake in the figure. M. xanthus AgmT and E. coli MltG are of similar sizes, with 239 and 240 amino acids, respectively. 

      In Figure 3 legend, define D3. 

      The definition of D_3_ was added into the figure legend.

      Figure 4A shows 100-frame composite micrographs, but no time interval between frames is given. 

      The imaging frequency, 10 Hz, was stated in the text. We also added this information into the figure legend.

      Line 98, the term "Especially" does not flow well, change to "This includes the characteristic..." or similar. 

      We deleted “especially” from the sentence.

      Line 179, "not" is not accurate, replace with "rarely." 

      Changed.

      Line 188, add a qualifier, "proper" before "bFACs assembly." 

      Added.

      Lines 196 and 202, provide the sizes of each protein in these fusion constructs. 

      We added these numbers to the figure legend.

      In Figure 5A add arrows to identify each band. State in legend whether this is a denaturing gel, if so, why are AgmT-PAmCherry homodimers present?

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

      Line 207, "near evenly along cell bodies" does not seem consistent with Figure 5B as there looks to be an enrichment of AgmT at cell poles. 

      We have replaced panel 5B with more typical images. Due to the shape difference between cell poles and the cylindrical nonpolar regions, many surface-associated proteins could appear “enriched” at cell poles. This effect was very obvious in Fig. 5B, possibly due to the unevenness of the agar surface. We examined our data carefully and did not find significant polar enrichment. Compared to AglZ that significantly enriches at poles and forms evenly-spaced clusters along the cell body, the localization of AgmT is completely different.  

      Lines 252 and 260, change "Fig. 5B" to "Fig. 5C." 

      We apologize for these mistakes. They have been corrected.

      Line 266, insert "the" before "cell envelope." 

      Added.

      Line 278, insert "presumably" between "AgmT generates (small openings)" 

      Corrected.

      Reviewer #2 (Recommendations For The Authors): 

      - Major comment: I would rephrase conclusions regarding a direct role of AgmT in focal adhesion assembly since these data are indirect (AglR binding to the cell wall is reduced in the absence of AgmT - this could also be interpreted as the absence of AgmT causing altered cell wall architecture that precludes AglR binding). Example: I don't think the data support line 222 "AgmT connects bFACs to PG", perhaps rephrased to accommodate more agnostic explanations. Likewise, line 308 states that MltG has been "adopted" by the gliding motility machinery. This conclusion cannot be drawn from the data presented. 

      We agree with the reviewer that the conclusions should be stated precisely. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue: 

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      However, we believe that the conclusion that “AgmT connects bFACs to PG" still stands true. Although AgmT is not likely to interact with the gliding machinery directly, its activity does increase the binding between bFACs and PG. 

      We agree with the reviewer that “adopt” may not be the best word to describe AgmT’s function in gliding. In the revised manuscript, we changed the phrase to “contributes to gliding motility”. 

      - Line 35: define "bFAC" at first use. 

      Fixed.

      - Figure 2: Mention in the caption why the pilA mutation is significant. Also, make more clear what one is supposed to see. You could include an arrow showing motile cells extruding from the colony edge, and mark + label the edge of the colony. 

      Following the reviewer’s recommendations, we described the motility phenotypes in detail in the main text, “On a 1.5% agar surface, the pilA- cells moved away from colony edges both as individuals and in “flare-like” cell groups, indicating that they were still motile with gliding motility. In contrast, the ∆aglR pilA- cells that lack an essential component in the gliding motor, were unable to move outward from the colony edge and thus formed sharp colony edges. Similarly, the ∆agmT pilA- cells also formed sharp colony edges, indicating that they could not move efficiently with gliding (Fig. 2B)”. 

      We also added a schematic block into panel B and two sentences into the legend, “To eliminate S-motility, we further knocked out the pilA gene that encodes pilin for type IV pilus. Cells that move by gliding are able to move away from colony edges.” 

      - Figure 3 caption. Mecillinam concentration should presumably be µg/mL, not g/mL?

      Also, remove the ".van,." in the second to last line. 

      We apologize for these mistakes. We have corrected them in the figure legend. 

      - Line 212 - at this point in the manuscript, the fact that AgmT likely does not assemble into bFACs is quite well established, so I would start this paragraph with something like "As an additional test, we...". 

      Revised as the reviewer recommended.

      - Figure 5C - this assay needs a protein loading control. How about whole-cell AglR before pelleting PG? 

      We do have a whole-cell loading control, which we have added into the revised figure.

      - Figure 5A - how are the dimers visible? Is this a native gel? If so, please add to the Methods section (I would find information on Western Blot there, but not on gel electrophoresis). 

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

    1. eLife assessment

      This useful study describes a single set of label-chase mass spectrometry experiments to confirm the molecular function of YafK as a peptidoglycan hydrolase, and to describe the timing of its attachment to the peptidoglycan. Confirmation of the molecular function of YafK is helpful for further studies to examine the function and regulation of the outer membrane-peptidoglycan link in bacteria. The evidence supporting the molecular function of YafK and that lpp molecules are shuffled on and off the peptidoglycan is solid, however, some of the other data still remain incomplete in the revised version. The work will be of interest to researchers studying lipoproteins in gram negative bacteria.

    2. Reviewer #1 (Public review):

      The authors present data on outer membrane vesicle (OMV) production in different mutants, but they state that this is beyond the scope of the current manuscript, which I disagree with. This data could provide valuable physiological context that is otherwise lacking. The preliminary blots suggest that YafK does not alter OMV biogenesis. I recommend repeating these blots with appropriate controls, such as blotting for proteins in the culture media, an IM protein, periplasmic protein and an OM protein to strengthen the reliability of these findings. Including this data in the manuscript, even if it does not directly support the initial hypothesis, would enhance the physiological relevance of the study. Currently, the manuscript relies completely on the experimental setup (labeling-mass spec) previously developed by the authors, which limits the broader scope and interpretability of this study.

      Additionally susceptibility of strains to detergents like SDS can be tested to provide a much needed physisological context to the study.

      In summary, the authors should consider revising the manuscript to improve clarity, substantiate their claims with more detailed evidence, and include additional experimental results that provide necessary physiological context to their study.

      Comments on the revised version:

      Regarding my comments from last review on a new figure on OMV analysis, The authors have redirected me to their previous response and have not performed the suggested control blots. I do not get their argument that this is for specialized audience. I do not have any more comments.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The authors present data on outer membrane vesicle (OMV) production in different mutants, but they state that this is beyond the scope of the current manuscript, which I disagree with. This data could provide valuable physiological context that is otherwise lacking. The preliminary blots suggest that YafK does not alter OMV biogenesis. I recommend repeating these blots with appropriate controls, such as blotting for proteins in the culture media, an IM protein, periplasmic protein and an OM protein to strengthen the reliability of these findings. Including this data in the manuscript, even if it does not directly support the initial hypothesis, would enhance the physiological relevance of the study. Currently, the manuscript relies completely on the experimental setup (labeling-mass spec) previously developed by the authors, which limits the broader scope and interpretability of this study.

      As stated in the previous response to the reviewers,  MBP and  RpoA were indeed used in the western blot experiments as  appropriate controls for periplasmic and cytoplasmic proteins, respectively. The open review process of eLife has enabled us to include additional data from experiments suggested by the reviewers. We think that this mode of publication is appropriate in the present case for the reporting of the requested analysis of OMVs. Indeed, these data are of interest only to a rather specialized audience.

      Reviewer #2 (Public Review):  

      Weaknesses:

      Figure 3 and 4 - why are the data shown here only two biological replicates, when there are 3-5 replicates shown in table S1 and S2? This makes it seem like you are cherry picking your favorite replicates. Please present the data as the mean of all the replicates performed, with error shown on the graph.

      We apologize for forgetting to update the legend to Figures 3 and 4. In the modified version, we have indicated that the values used for the plots are the average of three to five replicates. The full set of data together with the means and standard deviations appear in Tables S1 and S2. We would like to keep the current presentation of the data because introducing standard deviations in these figures compromise the legibility of the data.

      This work will have a moderate impact on the field of research in which the connections between the OM and peptidoglycan are being studied in E. coli. Since lpp is not widely conserved in gram negatives, the impact across species is not clear. The authors do not discuss the impact of their work in depth.

      We have already answered this comment in the first response to the reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the dynamics of a neural network model characterized by sparsely connected clusters of neuronal ensembles. They found that such a network could intrinsically generate sequence preplay and place maps, with properties like those observed in the real-world data. Strengths of the study include the computational model and data analysis supporting the hippocampal network mechanisms underlying sequence preplay of future experiences and place maps.

      Previous models of replay or theta sequences focused on circuit plasticity and usually required a pre-existing place map input from the external environment via upstream structures. However, those models failed to explain how networks support rapid sequential coding of novel environments or simply transferred the question to the upstream structure. On the contrary, the current proposed model required minimal spatial inputs and was aimed at elucidating how a preconfigured structure gave rise to preplay, thereby facilitating the sequential encoding of future novel environments.

      In this model, the fundamental units for spatial representation were clusters within the network. Sequential representation was achieved through the balance of cluster isolation and their partial overlap. Isolation resulted in a self-reinforced assembly representation, ensuring stable spatial coding. On the other hand, overlap-induced activation transitions across clusters, enabling sequential coding.

      This study is important when considering that previous models mainly focused on plasticity and experience-related learning, while this model provided us with insights into how network architecture could support rapid sequential coding with large capacity, upon which learning could occur efficiently with modest modification via plasticity.

      I found this research very inspiring and, below, I provide some comments aimed at improving the manuscript. Some of these comments may extend beyond the scope of the current study, but I believe they raise important questions that should be addressed in this line of research.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we have revised the text to clarify that the random clustering is random with respect to any future, novel environment (lines 111-114 and 710-712).

      Lines 111-114: “To reconcile these experimental results, we propose a model of intrinsic sequence generation based on randomly clustered recurrent connectivity, wherein place cells are connected within multiple overlapping clusters that are random with respect to any future, novel environment.”

      Lines 710-712: “Our results suggest that the preexisting hippocampal dynamics supporting preplay may reflect general properties arising from randomly clustered connectivity, where the randomness is with respect to any future, novel experience.”

      The cause of clustering could be prior experiences (e.g. Bourjaily and Miller, 2011) or developmental programming (e.g. Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022), and we have modified lines 116 and 714-718 to state this.

      Lines 116: Added citation of “Perin et al., 2011”

      Lines 714-718: “Synaptic plasticity in the recurrent connections of CA3 may primarily serve to reinforce and stabilize intrinsic dynamics, which could be established through a combination of developmental programming (Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022) and past experiences (Bourjaily and Miller, 2011), rather than creating spatial maps de novo.”

      We thank the reviewer for suggesting that the results of Liu et al., 2021 strengthen the support for our modeling motivations. We agree, and we now cite their finding that the hippocampal representations of novel environments emerged rapidly but were initially generic and showed greater discriminability from other environments with repeated experience in the environment (lines 130-134).

      Lines 130-134: “Further, such preexisting clusters may help explain the correlations that have been found in otherwise seemingly random remapping (Kinsky et al., 2018; Whittington et al., 2020) and support the rapid hippocampal representations of novel environments that are initially generic and become refined with experience (Liu et al., 2021).”

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Explanation 1 is correct: Our cluster-activation analyses (Figure 5) showed that the parameter values that generate preplay correspond to the parameter regions that support sustained cluster activity over multiple decoding time bins, which led us to the conclusion of the reviewer’s first proposed explanation.

      We have now added additional analyses supporting the conclusion that cluster-wise activity is the main driver of preplay rather than individual cell-identity (Figures 6 and 7). In Figure 6 we show that cluster-identity alone is sufficient to produce significant preplay by performing decoding after shuffling cell identity within clusters, and in Figure 7 we show that this result holds true when considering the sequence of spiking activity within population bursts rather than the spatial decoding.

      Lines 495-515: The pattern of preplay significance across the parameter grid in Figure 4f shows that preplay only occurs with modest cluster overlap, and the results of Figure 5 show that this corresponds to the parameter region that supports transient, isolated cluster-activation. This raises the question of whether cluster-identity is sufficient to explain preplay. To test this, we took the sleep simulation population burst events from the fiducial parameter set and performed decoding after shuffling cell identity in three different ways. We found that when the identity of all cells within a network are randomly permuted the resulting median preplay correlation shift is centered about zero (t-test 95% confidence interval, -0.2018 to 0.0012) and preplay is not significant (distribution of p-values is consistent with a uniform distribution over 0 to 1, chi-square goodness-of-fit test p=0.4436, chi-square statistic=2.68; Figure 6a). However, performing decoding after randomly shuffling cell identity between cells that share membership in a cluster does result in statistically significant preplay for all shuffle replicates, although the magnitude of the median correlation shift is reduced for all shuffle replicates (Figure 6b). The shuffle in Figure 6b does not fully preserve cell’s cluster identity because a cell that is in multiple clusters may be shuffled with a cell in either a single cluster or with a cell in multiple clusters that are not identical. Performing decoding after doing within-cluster shuffling of only cells that are in a single cluster results in preplay statistics that are not statistically different from the unshuffled statistics (t-test relative to median shift of un-shuffled decoding, p=0.1724, 95% confidence interval of -0.0028 to 0.0150 relative to the reference value; Figure 6c). Together these results demonstrate that cluster-identity is sufficient to produce preplay.

      Lines 531-551: While cluster-identity is sufficient to produce preplay (Figure 6b), the shuffle of Figure 6c is incomplete in that cells belonging to more than one cluster are not shuffled. Together, these two shuffles leave room for the possibility that individual cell-identity may contribute to the production of preplay. It might be the case that some cells fire earlier than others, both on the track and within events. To test the contribution of individual cells to preplay, we calculated for all cells in all networks of the fiducial parameter point their mean relative spike rank and tested if this is correlated with the location of their mean place field density on the track (Figure 7). We find that there is no relationship between a cell’s mean relative within-event spike rank and its mean place field density on the track (Figure 7a). This is the case when the relative rank is calculated over the entire network (Figure 7, “Within-network”) and when the relative rank is calculated only with respect to cells with the same cluster membership (Figure 7, “Within-cluster”). However, because preplay events can proceed in either track direction, averaging over all events would average out the sequence order of these two opposite directions. We performed the same correlation but after reversing the spike order for events with a negative slope in the decoded trajectory (Figure 7b). To test the significance of this correlation, we performed a bootstrap significance test by comparing the slope of the linear regression to the slope that results when performing the same analysis after shuffling cell identities in the same manner as in Figure 6. We found that the linear regression slope is greater than expected relative to all three shuffling methods for both the within-network mean relative rank correlation (Figure 6c) and the within-cluster mean relative rank correlation (Figure 6d).

      Lines 980-1000:

      “Cell identity shuffled decoding

      We performed Bayesian decoding on the fiducial parameter set after shuffling cell identities in three different manners (Figures 6 and 7). To shuffle cells in a cluster-independent manner (“Across-network shuffle”), we randomly shuffled the identity of cells during the sleep simulations. To shuffle cells within clusters (“Within-cluster shuffle”), we randomly shuffled cell identity only between cells that shared membership in at least one cluster. To shuffle cells within only single clusters (“Within-single-cluster shuffle”), we shuffled cells in the same manner as the within-cluster shuffle but excluded any cells from the shuffle that were in multiple clusters.

      To test for a correlation between spike rank during sleep PBEs and the order of place fields on the track (Figure 7), we calculated for each excitatory cell in each network of the fiducial parameter set its mean relative spike rank and correlated that with the location of its mean place field density on the track (Figure 7a). To account for event directionality, we calculated the mean relative rank after inverting the rank within events that had a negatively sloped decoded trajectory (Figure 7b). We calculated mean relative rank for each cell relative to all cells in the network (“Within-network mean relative rank”) and relative to only cells that shared cluster membership with the cell (“Within-cluster mean relative rank”). We then compared the slope of the linear regression between mean relative rank and place field location against the slope that results when applying the same analysis to each of the three methods of cell identify shuffles for both the within-network regression (Figure 7c) and the within-cluster regression (Figure 7d).”

      We also now show that the sequence of cluster-activation in events with 3 active clusters does not match the sequence of cluster biases on the track above chance levels and that events with fewer active clusters have the largest increase in median weighted decode correlation (Figure 5—figure supplement 1), showing that the reviewer’s second explanation is not the case.

      Lines 466-477: “The results of Figure 5 suggest that cluster-wise activation may be crucial to preplay. One possibility is that the random overlap of clusters in the network spontaneously produces biases in sequences of cluster activation which can be mapped onto any given environment. To test this, we looked at the pattern of cluster activations within events. We found that sequences of three active clusters were not more likely to match the track sequence than chance (Figure 5—figure supplement 1a). This suggests that preplay is not dependent on a particular biased pattern in the sequence of cluster activation. We then we asked if the number of clusters that were active influenced preplay quality. We split the preplay events by the number of clusters that were active during each event and found that the median preplay shift relative to shuffled events with the same number of active clusters decreased with the number of active clusters (Spearman’s rank correlation, p=0.0019, =-0.13; Figure 5—figure supplement 1b).”

      Lines 1025-1044:

      “Active cluster analysis

      To quantify cluster activation (figure 5), we calculated the population rate for each cluster individually as the mean firing rate of all excitatory cells belonging to the cluster smoothed with a Gaussian kernel (15 ms standard deviation). A cluster was defined as ‘active’ if at any point its population rate exceeded twice that of any other cluster during a PBE. The active clusters’ duration of activation was defined as the duration for which it was the most active cluster.

      To test whether the sequence of activation in events with three active clusters matched the sequence of place fields on the track, we performed a bootstrap significance test (Figure 5—figure supplement 1). For all events from the fiducial parameter set that had three active clusters, we calculated the fraction in which the sequence of the active clusters matched the sequence of the clusters’ left vs right bias on the track in either direction. We then compared this fraction to the distribution expected from randomly sampling sequences of three clusters without replacement.

      To determine if there was a relationship between the number of active clusters within an event and it’s preplay quality we performed a Spearman’s rank correlation between the number of active clusters and the normalized absolute weighted correlation across all events at the fiducial parameter set. The absolute weighted correlations were z-scored based on the absolute weighted correlations of the time-bin shuffled events that had the same number of active clusters.”

      We also now add control simulations showing that without the cluster-dependent bias the population burst events no longer significantly decode as preplay (Figure 4—figure supplement 4e).

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We have added a figure illustrating different types of networks and their corresponding SWI (Figure 1—figure supplement 1) and a corresponding description in the main text (lines 228-234).

      Lines 228-234: “A ring lattice network (Figure 1—figure supplement 1a) exhibits high clustering but long path lengths between nodes on opposite sides of the ring. In contrast, a randomly connected network (Figure 1—figure supplement 1c) has short path lengths but lacks local clustered structure. A network with small world structure, such as a Watts-Strogatz network (Watts and Strogatz, 1998) or our randomly clustered model (Figure 1—figure supplement 1b), combines both clustered connectivity and short path lengths. In our clustered networks, for a fixed connection probability the SWI increases with more clusters and lower cluster participation…”

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We have modified lines 690-692 to clarify that that statement is specific to our model.

      Lines 690-692: “In our clustered network structure, such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index (SWI, Figure 1g; Neal, 2015; Neal, 2017).”

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      We have added a brief explanation of this in the text in lines 281-284.

      Lines 281-284: “During simulated sleep, sparse, stochastic spiking spontaneously generates sufficient excitement within the recurrent network to produce population burst events resembling preplay (Figure 2d-f)”

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Our clusters correspond to functional assemblies in that cells that share a cluster membership have more-similar place fields and are more likely to reactivate together during population burst events. In the figure to the right, we show for an example network at the fiducial parameter set the Pearson correlation between all pairs of place fields split by whether the cells share membership in a cluster (blue) or do not (red).

      Author response image 1.

      We expect an assembly analysis would identify assemblies similarly to the experimental data, but we see this additional analysis as a future direction. We have added a description of this correspondence in the text at lines 134-137.

      Lines 134-137: “Such clustered connectivity likely underlies the functional assemblies that have been observed in hippocampus, wherein groups of recorded cells have correlated activity that can be identified through independent component analysis (Peyrache et al., 2010; Farooq et al., 2019).”

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally. We report here preliminary results supporting this as an interesting future direction.

      Author response image 2.

      We performed a similar analysis to that reported in Figure 3C of Dragoi and Tonegawa, 2013. We determined the statistical significance of each event individually for each of the two environments by testing whether the decoded event’s absolute weighted correlation exceeded that 99th percentile of the corresponding shuffle events. We then fit a linear regression to the fraction of events that were significant for each of the two tracks and that were significant to either of the two tracks (left panel of above figure). We then estimated the track capacity as the number of tracks at the point where the linear regression reached 100% of the network capacity. We find that applying this analysis to our fiducial parameter set returns an estimate of ~8.6 tracks (Dragoi and Tonegawa, 2013, found ~15 tracks).

      We performed this same analysis for each parameter point in our main parameter grid (right panel of above figure). The parameter region that produces significant preplay (Figure 4f) corresponds to the region that has a track capacity of approximately 8-25 tracks. In the parameter grid region that does not produce preplay, the estimated track capacity approaches the high values that this analysis would produce when applied to events that are significant only at the false-positive rate. This analysis is based on the assumption that each preplay event would significantly correspond to at least one future event. Interesting interpretation issues arise when applying this analysis to parameter regions that do not produce statistically significant preplay, which we leave to future directions to address.

      We note two differences between our analysis here and that in Dragoi and Tonegawa, 2013. First, their track capacity analysis was performed on spike sequences rather than decoded spatial sequences, which is the focus of our manuscript. Second, they recorded rats exploring three novel tracks, while in our manuscript we only simulated two novel tracks, which reduces the accuracy of our linear extrapolation of track capacity.

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a spiking network model with clustered neurons produces intrinsic spike sequences when driven with a ramping input, which are recapitulated in the absence of input. This behavior is only seen for some network parameters (neuron cluster participation and number of clusters in the network), which correspond to those that produce a small world network. By changing the strength of ramping input to each network cluster, the network can show different sequences.

      Strengths:

      A strength of the paper is the direct comparison between the properties of the model and neural data.

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We thank the reviewer for pointing out this important limitation. We see extensive testing of the time vs space coding properties of this network as a future direction, but we have performed simulations that demonstrate the robustness of place field coding to variations in traversal speeds and added the results as a supplemental figure (Figure 3—figure supplement 1).

      Lines 332-336: “To verify that our simulated place cells were more strongly coding for spatial location than for elapsed time, we performed simulations with additional track traversals at different speeds and compared the resulting place fields and time fields in the same cells. We find that there is significantly greater place information than time information (Figure 3—figure supplement 1).

      Lines 835-841: “To compare coding for place vs time, we performed repeated simulations for the same networks at the fiducial parameter point with 1.0x and 2.0x of the original track traversal speed. We then combined all trials for both speed conditions to calculate both place fields and time fields for each cell from the same linear track traversal simulations. The place fields were calculated as described below (average firing rate within each of the fifty 2-cm long spatial bins across the track) and the time fields were similarly calculated but for fifty 40-ms time bins across the initial two seconds of all track traversals.”

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to variations in feedforward input is important. We have added new simulation results (Figure 4—figure supplement 4) showing that the existence of preplay in our model is robust to variations in the form of input.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Lines 413-420: To test the robustness of our results to variations in input types, we simulated alternative forms of spatially modulated feedforward inputs. We found that with no parameter tuning or further modifications to the network, the model generates robust preplay with variations on the spatial inputs, including inputs of three linearly varying cues (Figure 4—figure supplement 4a) and two stepped cues (Figure 4—figure supplement 4b-c). The network is impaired in its ability to produce preplay with binary step location cues (Figure 4—figure supplement 4d), when there is no cluster bias (Figure 4—figure supplement 4e), and at greater values of cluster participation (Figure 4—figure supplement 4f).

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 9b (original Figure 7b) the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson et al., 1996; Pavlides, et al., 2019).

      Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation in the model is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated.

      We have added text at lines 627-630 clarifying these points.

      Lines 628-631: “Cells that share membership in a cluster will have some amount of correlation in their remapping due to the cluster-dependent cue bias, which is consistent with experimental results (Hampson et al., 1996; Pavlides et al., 2019), but the combinatorial nature of cluster membership renders the overall place field map correlations low (Figure 9b).”

      Reviewer #3 (Public Review):

      Summary:

      This work offers a novel perspective on the question of how hippocampal networks can adaptively generate different spatial maps and replays/preplays of the corresponding place cells, without any such maps pre-existing in the network architecture or its inputs. Unlike previous modeling attempts, the authors do not pre-tune their model neurons to any particular place fields. Instead, they build a random, moderately-clustered network of excitatory (and some inhibitory) cells, similar to CA3 architecture. By simulating spatial exploration through border-cell-like synaptic inputs, the model generates place cells for different "environments" without the need to reconfigure its synaptic connectivity or introduce plasticity. By simulating sleep-like random synaptic inputs, the model generates sequential activations of cells, mimicking preplays. These "preplays" require small-world connectivity, so that weakly connected cell clusters are activated in sequence. Using a set of electrophysiological recordings from CA1, the authors confirm that the modeled place cells and replays share many features with real ones. In summary, the model demonstrates that spontaneous activity within a small-world structured network can generate place cells and replays without the need for pre-configured maps.

      Strengths:

      This work addresses an important question in hippocampal dynamics. Namely, how can hippocampal networks quickly generate new place cells when a novel environment is introduced? And how can these place cells preplay their sequences even before the environment is experienced? Previous models required pre-existing spatial representations to be artificially introduced, limiting their adaptability to new environments. Other models depended on synaptic plasticity rules which made remapping slower than what is seen in recordings. This modeling work proposes that quickly-adaptive intrinsic spiking sequences (preplays) and spatially tuned spiking (place cells) can be generated in a network through randomly clustered recurrent connectivity and border-cell inputs, avoiding the need for pre-set spatial maps or plasticity rules. The proposal that small-world architecture is key for place cells and preplays to adapt to new spatial environments is novel and of potential interest to the computational and experimental community.

      The authors do a good job of thoroughly examining some of the features of their model, with a strong focus on excitatory cell connectivity. Perhaps the most valuable conclusion is that replays require the successive activation of different cell clusters. Small-world architecture is the optimal regime for such a controlled succession of activated clusters.

      The use of pre-existing electrophysiological data adds particular value to the model. The authors convincingly show that the simulated place cells and preplay events share many important features with those recorded in CA1 (though CA3 ones are similar).

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input. We show in Figure 4—figure supplement 4that our preplay results are robust to several variations in the location-cue inputs. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      We apologize for a lack of clarity if we have caused confusion about the type of inputs and if we implied an absence of spatially-tuned information in the network. In order for place fields to appear the network must receive spatial information, which we model as linearly-varying cues and illustrate in Figure 1b and describe in the caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683). Such input is not place-field like, as the small bias to any cell linearly decreases from one boundary of the track or the other.

      The cluster-dependent bias, which is also described in the same lines (Figure 1 caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683)), only affects the strength of the spatial cues that are present during simulated run periods. Crucially, this cluster-dependent bias is absent during sleep simulations when preplay occurs, which is why preplay can equally correlate with place field sequences in any context.

      We have modified the text (lines 207-210, 218, and 824-827) to clarify these points. We have also added results from a control simulation (Figure 4—figure supplement 4e) showing that preplay is not generated in the absence of the cluster-dependent bias.

      Lines 207-210: “This bias causes cells that share cluster memberships to have more similar place fields during the simulated run period, but, crucially, this bias is not present during sleep simulations so that there is no environment-specific information present when the network generates preplay.”

      Lines 218: “Second, to incorporate cluster-dependent correlations in place fields, a small…”

      Lines 824-827: “The addition of this bias produced correlations in cells’ spatial tunings based on cluster membership, but, importantly, this bias was not present during the sleep simulations, and it did not lead to high correlations of place-field maps between environments (Figure 9b).”

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells. We have added a brief discussion of this to the manuscript.

      Lines 733-739: “Additionally, the in vivo microcircuitry of CA3 is complex and includes aspects such as nonlinear dendritic computations and a variety of inhibitory cell types (Rebola et al., 2017). This microcircuitry is crucial for explaining certain aspects of hippocampal function, such as ripple and gamma oscillogenesis (Ramirez-Villegas et al., 2017), but here we have focused on a minimal model that is sufficient to produce place cell spiking activity that is consistent with experimentally measured place field and preplay statistics.”

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we have modified lines 114-116 to address the clustered connectivity reported in CA3.

      Lines 114-116: “Such clustering is a common motif across the brain, including the CA3 region of the hippocampus (Guzman et al., 2016) as well as cortex (Song et al., 2005), …”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Based on Figure 3, the place fields are not uniformly distributed in the maze. Meanwhile, based on Figure 1b and Methods, the total input seems to be uniform across the maze. Why does the uniform total external input lead to nonuniform network activities?

      While the total input to the network is constant across the maze, the input to any individual cell can peak only at either end of the track. All excitatory cells receive input from both the left-cue and the right-cue with different input strengths. By chance and due to the cluster-dependent bias some cells will have stronger input from one cue than the other and will therefore be more likely to have a place field toward that side of the track. However, no cell receives a peak of input in the center of the track. We have modified lines 141-143 to clarify this.

      Lines 141-143: “While the total input to the network is constant as a function of position, each cell only receives a peak in its spatially linearly varying feedforward input at one end of the track.”

      (2) I find these sentences confusing: "...we expected that the set of spiking events that significantly decode to linear trajectories in one environment (Figure 4) should decode with a similar fidelity in another environment..." (Lines 513-515) and "As expected... but not with the place fields of trajectories from different environments (Figure 7c)" (Line 517-520). What is the expectation for cross-environment decoding? Should they be similar or different? Also, in Figure 7c, the example is not fully convincing. In the figure caption, it states that decoding is significant in the top row but not in the bottom row, but they look similar across rows.

      Original lines 513-515 refer to the entire set of events, while original lines 517-520 refer to one example event. The sleep events are simulated without any track-specific information present, so the degree to which preplay occurs when decoding based on the place fields of a specific future track should be independent of any particular track when considering the entire set of decoded PBEs, as shown in Figure 9d (original Figure 7). However, because there is strong remapping across tracks (Figure 9b), an individual event that shows a strong decoded trajectory based on the place fields of one track (Figure 9c, top row) should show chance levels of a decoded trajectory when decoded with the place fields of an alternative track (Figure 9c, bottom row).

      We have revised lines 643-650 for clarity, and we have added statistics for the events shown in Figure 9c.

      Lines 644-651: “Since the place field map correlations are high for trajectories on the same track and near zero for trajectories on different tracks, any individual event would be expected to have similar decoded trajectories when decoding based on the place fields from different trajectories in the same environment and dissimilar decoded trajectories when decoding based on place fields from different environments. A given event with a strong decoded trajectory based on the place fields of one environment would then be expected to have a weaker decoded trajectory when decoded with place fields from an alternative environment (Figure 9c).

      Lines 604-608: “(c) An example event with a statistically significant trajectory when decoded with place fields from Env. 1 left (absolute correlation at the 99th percentile of time-bin shuffles) but not when decoded with place fields of the other trajectories (78th, 45th, and 63rd percentiles, for Env. 1 right, Env. 2 left, and Env. 2 right, respectively). shows a significant trajectory when it is decoded with place fields from one environment (top row), but not when it is decoded with place fields from another environment (bottom row). “

      (3) In Methods, the equation at line 610, E in the last term should be E_ext.

      We modeled the feedforward inputs as excitatory connections with the same reversal potential as the recurrent excitatory connections, so  is the proper value.

      (4) Equation line 617 states that conductances follow exponential decay, but the initial conductances of g_I.g_E and g_SRA are not specified.

      We have added a description of the initial values in lines 760-764.

      Lines 760-764: “Initial feed-forward input conductances were set to values approximating their steady-state values by randomly selecting values from a Gaussian with a mean of   and a standard deviation of . Initial values of the recurrent conductances and the SRA conductance were set to zero.”

      (5) In the parameter table below line 647, W_E-E, W_E-I, and W_I-E are not described in the text.

      We have clarified in lines 757-760 that the step increase in conductance corresponds to these parameter values.

      Lines 757-760: “A step increase in conductance occurs at the time of each spike by an amount corresponding to the connection strength for each synapse ( for E-to-E connections, for E-to-I connections, and  for I-to-E connections), or by  for .”

      (6) On line 660, "...Each environment and the sleep session had unique context cue input weights...". Does that mean that within a sleep session, the network received the same context input? How strongly are the sleep dynamics driven by that context input rather than by intrinsic dynamics? Usually, sleep activity is high dimensional, what would happen if the input during sleep is more stochastic?

      Yes, within a sleep session each network receives a single set of context inputs, which are implemented as independent Poisson spike trains (so being independent, in small time-windows the dimensionality is equal to the number of neurons). The effects of any particular set of sleep context cue inputs should be minor, since the standard deviation of the input weights, , is small. Further, because the preplay analysis is performed across many networks at each parameter point, the observation of preplay is independent of any particular realization of either the recurrent network or the sleep context inputs.

      Further exploring the effects of more biophysically realistic neural dynamics during simulated sleep is an interesting future direction.

      (7) One bracket is missing in the denominator in line 831.

      We have fixed this error.

      Line 1005: “)” -> “()”

      Reviewer #2 (Recommendations For The Authors):

      - I would suggest the authors cite Chenkov et al 2017, PLOS Comp Bio, in which "replay" sequences were produced in clustered networks, and discuss how their work differs.

      We have included a contrast of our model to that of Chenkov et al., 2017 in lines 73-78.

      Lines 73-78: “Related to replay models based on place-field distance-dependent connectivity is the broader class of synfire-chain-like models. In these models, neurons (or clusters of neurons) are connected in a 1-dimensional feed-forward manner (Diesmann et al., 1999; Chenkov et al., 2017). The classic idea of a synfire-chain has been extended to included recurrent connections, such as by Chenkov et al., 2017, however such models still rely on an underlying 1-dimensional sequence of activity propagation.”

      - Figure legend 2e says "replay", should be "preplay".

      We have fixed this error.

      Line 255: “(e) Example preplay event…”

      - How much does the context cue affect the result? e.g. Is sleep notably different with different sleep context cues?

      As discussed above in our response to Reviewer 1, the context cue weights have a small standard deviation, , which means that differences in the effects of different realizations of the context inputs are small. Different sets of context cues will cause cells to have slightly higher or lower spiking rates during sleep simulations, but because there is no correlation between the sleep context cue and the place field simulations there should be no effect on preplay quality.

      - Figure 4 should include a control with a single cluster.

      We thank the reviewer for this suggestion and have added additional control simulations.

      In our model, the recurrent structure of a network with a single cluster is equivalent to a cluster-less random network. Additionally, any network where cluster participation equals the number of clusters is equivalent to a cluster-less random network, since all neurons belong to all clusters and can therefore potentially connect to any other neuron. Such a condition corresponds to a diagonal boundary where the number of clusters equals the cluster participation, which occurs at higher values of cluster participation than we had shown in our primary parameter grid.

      We now include simulation results that extend to this boundary, corresponding to cluster-less networks (Figure 4—figure supplement 4f). Networks at these parameter points do not show preplay. See our earlier response for the new text associated with Figure 4—figure supplement 4.

      - The results of Figure 4 are very noisy. I would recommend increasing the sampling, both in terms of the number of population events in each condition and the number of conditions.

      We have run simulations for longer durations (300 seconds) and with more networks (20) to produce more accurate empirical values for the statistics calculated across the parameter grids in Figures 3 and 4. Our additional simulations (Figure 4—figure supplement 4) provide support that the parameter region of preplay significance is reliable.

      Lines 831-833: “For the parameter grids in Figures 3 and 4 we simulated 20 networks with 300 s long sleep sessions in order to get more precise empirical estimates of the simulation statistics.”

      - It's not entirely clear what's different between the analysis described in lines 334-353, and the preplay analysis in Figure 2. In general, the description of this result was difficult to follow, as it included a lot of text that would be better served in the methods.

      In Figure 2 we first introduce the Bayesian decoding method, but it is not until Figure 4 that the shuffle-based significance testing is first introduced. We have simplified the description of the shuffle comparison in lines 371-375 and now refer the reader to the methods for details.

      Lines 371-375: “We find significant preplay in both our reference experimental data set (Shin et al., 2019; Figure 4a, b; see Figure 4—figure supplement 1 for example events) and our model (Figure 4c, d) when analyzed by the same methods as Farooq et al., 2019, wherein the significance of preplay is determined relative to time-bin shuffled events (see Methods). For each detected event we calculated its absolute weighted correlation. We then generated 100 time-bin shuffles of each event, and for each shuffle recalculated the absolute weighted correlation to generate a null distribution of absolute weighted correlations.”

      - Many of the figures have low text resolution (e.g. Figure 6).

      We have now fixed this.

      - How does the clustered small world network compare to e.g. a small world ring network as used in Watts and Strogatz 1998?

      As described in our above response to Reviewer 1's fourth point, we have added a supplementary figure (Figure 1—figure supplement 1, with corresponding text) comparing our model with the Watts-Strogatz model.

      Reviewer #3 (Recommendations For The Authors):

      Figure 5 would benefit from a plot of the overlap of activated clusters per event.

      In our cluster activation analysis in Figure 5, we defined a cluster as “active” if at any point in the event its population rate was twice that of any other clusters’. We used this definition—which permits no overlap of activated clusters—rather than a definition based on a z-scoring of the rate, because we determined that preplay required periods of spiking dominated by individual clusters.

      Author response image 3.

      The choice of such a definition is supported by our observation that most spiking activity within an event is dominated by whichever cluster is most active at each point in time. In the left panel of the above figure we show the distribution of the average fraction of spikes within each event that came from the most active cluster at each point in time. The right panel shows the distribution of the average across time within each event of the ratio of the population activity rate of the most active cluster to the second most active cluster. The data for both panels comes from all events at the fiducial parameter set.

      Author response image 4.

      Rather than overlapping at a given moment in time, clusters might have overlap in their probability of being active at some point within an event. We do find that there is a small but significant correlation in cluster co-activation. For each network we calculated the activation correlation across events for each pair of clusters (example network show in the left panel). We compared the distribution of resulting absolute correlations against the values that results after shuffling the correlations between cluster activations (right panel, all correlations for all networks from the fiducial parameter point).

      Figures 4e/f are referred to as 4c/d in the text (pg 14).

      We have fixed this error.

      Lines 400-412: “4c” -> “4e” and “4d” -> “4f”

    2. eLife assessment

      This study presents an important finding on the spontaneous emergence of structured activity in artificial neural networks endowed with specific connectivity profiles. The evidence supporting the claims of the authors is convincing, providing direct comparison between the properties of the model and neural data although investigating more naturalistic inputs to the network would have strengthened the main claims. The work will be of interest to systems and computational neuroscientists studying the hippocampus and memory processes.

    3. Reviewer #1 (Public review):

      Summary:

      An investigation of the dynamics of a neural network model characterized by sparsely connected clusters of neuronal ensembles. The authors found that such a network could intrinsically generate sequence preplay and place maps, with properties like those observed in the real-world data.

      Strengths:

      Computational model and data analysis supporting the hippocampal network mechanisms underlying sequence preplay of future experiences and place maps.<br /> The revised version of the manuscript addressed all my comments and as a result is significantly improved.

      Weaknesses:

      None noted

    4. Reviewer #2 (Public review):

      Summary:

      The authors show that a spiking network model with clustered connectivity produces intrinsic spike sequences when driven with an ramping input, which are recapitulated in the absence of input. This behavior is only seen for some network parameters (neuron cluster participation and number of clusters in the network), which correspond to those that produce a small world network. By changing the strength of ramping input to each network cluster, the network can show different sequences.

      Strengths:

      A strength of the paper is the direct comparison between the properties of the model and neural data.

      Weaknesses:

      My main critique of the paper relates to the form of the input to the network. Specifically, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. In order to address this concern, the authors would need to test the spatial tuning of their network in 2-dimensional environments, and with different kinds of input from a population of neurons that have a range of degree of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

    5. Reviewer #3 (Public review):

      This work offers a novel perspective to the question of how hippocampal networks can adaptively generate different spatial maps and replays of the corresponding place cells, without any such maps pre-existing in the network architecture or its inputs. And how can these place cells preplay their sequences even before the environment is experienced? Previous models required pre-existing spatial representations to be artificially introduced, limiting their adaptability to new environments. Others depended on synaptic plasticity rules which made remapping slower that what is seen in recordings. In contrast, this modeling study proposes that quickly-adaptive intrinsic spiking sequences (preplays) and spatially tuned spiking (place cells) can be generated in a network through randomly clustered recurrent connectivity. By simulating spatial exploration through border-cell-like synaptic inputs, the model generates place cells for different "environments" without the need to reconfigure its synaptic connectivity or introduce plasticity. By simulating sleep-like random synaptic inputs, the model generates sequential activations of cells, mimicking preplays. These "preplays" require small-world connectivity, so that cell clusters are activated in sequence. Using a set of electrophysiological recordings from CA1, the authors confirm that the modeled place cells and replays share many features with recorded ones.

      Many features of the model are thoroughly examined, and conclusions are overall convincing (within the simple architecture of the model). Even though the modeled connectivity applies more closely to CA3, it remains unclear whether CA3 recapitulates the proposed small world architecture.

      In any case, the proposal that a small-world-structured, clustered network can generate flexible place cells and replays without the need for pre-configured maps is novel and of potential interest to a wide computational and experimental community.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment: I find that the eLife assessment mentions “statistical analyses are yet to be carried out to support statements of statistical significance” while the reviewers mention that the data are compelling and results are technically solid. Besides all observations in the manuscript are presented with robust and established norms of statistical analysis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The use of data from before COVID-19 is both a strength and a weakness. Because COVID had effects on vascular health and had higher death rates for groups with the comorbidities of interest here, it has likely shifted the demographics in ways that would shift the results in unpredictable ways if the analysis were repeated with current data. This can be a strength in providing a reference point for studying those changes as well as allowing researchers to study differences between regions without the complication of different public health responses adding extra variation to the data. On the other hand, it limits the usefulness of the data in research concerned with the current status of the various populations.

      We completely agree with the observation, but were restricted as the purpose was to use the most robust and technically qualified data from GBD. The post COVID19 GBD data has not yet been released, but I am sure the observations made in the study can help in guiding the issues in the post COVID era too, because genetics is not going to change in these population groups.

      However, we did highlight this aspect of COVID19 even in our original version and also in the revised version.

      Reviewer #2 (Public Review):

      Weaknesses:

      The presentation is not focused. It is important to include p-values for all comparisons and focus the presentation on the main effects from the dataset analysis.

      The significant p-values were restricted to public health data only to identify and distinguish differences in incidence, prevalence and mortality and how they differ across world populations. These differences have often been interpreted from socio-economic point of view, while our manuscript presents the reasons for differences for main condition (Stroke) and its comorbid condition among different ethnicities from a genetic perspective. This genetic perspective was further explored to identify unique ethnic specific variants and their patterns of linkage disequilibrium in distinguishing the phenotypic variations. Considering the quantum and diversity of data, both for public health and GWAS data, there can be several directions but for presentation we focused only on the most distinguishing and established phenotypic differences. I am sure this will open up avenues for several future investigations including COVID, as has been highlighted by the reviewers too. All observations were presented with robust and established norms of statistical analysis.


      The following is the authors’ response to the original reviews.

      Thanks for the constructive observations on strengths and weaknesses of our manuscript. Interestingly, some of the weaknesses mentioned here also turns out to be the strength of the article. For example COVID19 has been mentioned by the reviewer as a driver to increase the mortality in some comorbid conditions and stroke. Firstly, I must clarify that, our data is from PreCOVID era and we indeed mention that in COVID era, COVID-19 might differentially impact the risk of stroke. Possibly this differential influence on the comorbidities of stroke, is likely to be influenced by its underlying genetics of stroke and its comorbidities.

      I have tried to address the concerns raised by the reviewers, which ideally doesn’t impact the original manuscript. Statistical limitation has been commented pertaining to P-values, which has been clarified here. However, certain minor concerns such as abbreviations have been resolved in the revised manuscript. My response to weakness and reviewer’s comments are mentioned below.

      Reviewer #1 (Public Review):

      Strengths:

      The data provided here will provide a foundation for a lot of future research into the causes of the observed correlations as well as whether the observed differences in comorbidities across regions have clinically relevant effects on risk management.

      Weaknesses:

      • As with any cross-national analysis of rates, the data is vulnerable to differences in classification and reporting across jurisdictions.

      GBD data is the most robust and most comprehensive data resource which has been used and accepted globally in predicting the health metrics statistics.

      GBD data indeed considers normalisations, regarding classification and reporting.

      To the best of our knowledge this is the best available resource to consider all health metrics analysis.

      • Furthermore, given the increased death rate from COVID-19 associated with many of these comorbid conditions and the long-term effects of COVID-19 infection on vascular health, it is expected that many of the correlations observed in this dataset will shift along with the shifting health of the underlying populations.

      I must clarify that we have used data prior to COVID-19.

      But yes the patterns after COVID19 will shift due to the impact of covid. This makes the study even more relevant as the comorbid conditions of stroke are also the risk drivers for COVID19 and mortality. This shift has been reported by some authors, which has been discussed in the discussion.

      Therefore, understanding the genetic factors underlying stroke and its comorbid conditions might help in resolving how COVID19 might differentially impact on health outcome.

      We did highlight this aspect of COVID19 even in our original version.

      Introduction 1st para:

      “It is the accumulated risk of comorbid conditions that enhances the risk of stroke further. Are these comorbid conditions differentially impacted by socio-economic factors and ethnogeographic factors. This was clearly evident in COVID era, when COVID-19 differentially impacted the risk of stroke, possibly due to its differential influence on the comorbidities of stroke.”

      Discussion 3rd para:

      “Studies reported reduction in life expectancy in 31 of 37 high-income countries, deduced to be due to COVID-191 . However, it would be unfair to ignore the comorbid conditions which could also be the critical determinants for reduced life expectancy in these countries.”

      Recommendations For The Authors:

      On page 5, the authors make a note about Africa and the Middle East having the highest ASMR for high SBP and comment about the relative populations of these regions. The populations of the regions are irrelevant to the rate.

      Since the study is on comorbid factors of stroke and its impact on mortality therefore, relative burden seems critical. This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Results section 2:

      “Ethno-regional differences in mortality and prevalence of stroke and its major comorbid conditions

      We observed interesting patterns of ASMRs of stroke, its subtypes and its major comorbidities across different regions over the years as shown in figure 1a, table 1 and supplementary files S2 & S3. When assessed in terms of ranks, high SBP is the most fatal condition followed by IHD in all regions, except Oceania (OCE) where IHD and high SBP swap ranks. Africa (AFR; 206.2/100000, 95%UI 177.4-234.2) and Middle East (MDE; 198.6/100000, 95%UI 162.8-234.4) have the highest ASMR for high SBP, even though they rank as only the third and sixth most populous continents (fig. S2), respectively.”

      On page 17, the authors are alarmed by a large ratio between prevalence rates and mortality rates for certain conditions. This is confusing since this indicates that these conditions are not as dangerous as the other conditions.

      This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Discussion para 1:

      “While the global stroke prevalence is nearly 15 times its mortality rate, prevalence of comorbid conditions such as high SBP, high BMI, CKD, T2D are alarmingly 150- to 500-fold higher than their mortality rates. These comorbid conditions can drastically affect the outcome of stroke.”

      In Figure 4, the colors are not defined.

      In Structure plot colours are assigned as per each K, it doesn’t directly refer to any population. But the plot distinguishes the stratification of populations as per K. Ramasamy, R.K., Ramasamy, S., Bindroo, B.B. et al. STRUCTURE PLOT: a program for drawing elegant STRUCTURE bar plots in user friendly interface. SpringerPlus 3, 431 (2014). https://doi.org/10.1186/2193-1801-3-431

      Reviewer #2 (Public Review):

      Strengths:

      The idea is interesting and the data are compelling. The results are technically solid.

      The authors identify specific genetic loci that increase the risk of a stroke and how they differ by region.

      Weaknesses:

      The presentation is not focused. It would be better to include p-values and focus presentation on the main effects of the dataset analysis.

      I presume the comment is made with reference to results with significant p-values.

      P-values are mentioned in the main text when referring to significant decrease or increase with respect to global rates and time e.g. P-values for comparison of a year 2019, are based on regional rates to global rates of 2019. Supplementary table S2a (mortality) and S3a (prevalence) e.g. P-values for comparison between year is based on 2019 rates to 2009 rates in Supplementary table S2b (mortality) and S3b (prevalence) e.g. P-values for proportional mortality and proportional prevalence in Supplementary table S4 and S5 is also based on global rates.

      Recommendations For The Authors:

      It would be better to minimize the use of acronyms. Often one has to go back to decipher what the acronym stands for. It is fine to have acronyms in figure legends, if necessary. However, at least for regions, please do not use acronyms.

      In the revised version we have tried to minimise the Acronyms.

      Removed the acronyms for regions and other places wherever possible however, the diseases acronyms have been maintained as per the GBD terms.

      Please focus the presentation on the main results. Currently, the presentation wanders and repeats itself a lot.

      Since the manuscript tries to address the global and regional rates of prevalence, mortality and its relationship to genetic correlates, it is difficult not to repeat the same to stress the significant observations coming out of different analysis methods. This might reflect on some amount of repetitiveness but the intention was to stress the significant observations.

      I would also recommend acknowledging and discussing socioeconomic factors earlier in the manuscript.

      Current mention happens in 3rd para of Discussion

      “The changing dynamics of stroke or its comorbid conditions can be attributed to multitude of factors. Often global burden of stroke has been discussed from the point of view of socio-economic parameters. Studies indicate that half of the stroke-related deaths are attributable to poor management of modifiable risk factors 8,9. However, we observe that different socio-economic regions are driven by different risk factors.”

    1. eLife assessment

      This study presents a fundamental finding on how levels of m6A levels are controlled, invoking a consolidated model where degradation of modified RNAs in the cytoplasm plays a primary role in shaping m6A patterns and dynamics, rather than needing active regulation by m6A erasers and other related processes. The evidence is compelling and uses transcriptome-wide data and mechanistic modeling. However, it is possible that m6A-erasers will have roles in specific developmental contexts or conditions, so this model may not apply universally.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors propose that changes in m6A levels may be predictable via a simple model that is based exclusively on mRNA metabolic events. Under this model, m6A mRNAs are "passive" victims of RNA metabolic events with no "active" regulatory events needed to modulate their levels by m6A writers, readers, or erasers; looking at changes in RNA transcription, RNA export, and RNA degradation dynamics is enough to explain how m6A levels change over time.

      The relevance of this study is extremely high at this stage of the epi transcriptome field. This compelling paper is in line with more and more recent studies showing how m6A is a constitutive mark reflecting overall RNA redistribution events. At the same time, it reminds every reader to carefully evaluate changes in m6A levels if observed in their experimental setup. It highlights the importance of performing extensive evaluations on how much RNA metabolic events could explain an observed m6A change.

      Weaknesses:

      It is essential to notice that m6ADyn does not exactly recapitulate the observed m6A changes. First, this can be due to m6ADyn's limitations. The authors do a great job in the Discussion highlighting these limitations. Indeed, they mention how m6ADyn cannot interpret m6A's implications on nuclear degradation or splicing and cannot model more complex scenario predictions (i.e., a scenario in which m6A both impacts export and degradation) or the contribution of single sites within a gene.

      Secondly, since predictions do not exactly recapitulate the observed m6A changes, "active" regulatory events may still play a partial role in regulating m6A changes. The authors themselves highlight situations in which data do not support m6ADyn predictions. Active mechanisms to control m6A degradation levels or mRNA export levels could exist and may still play an essential role.

      (1) "We next sought to assess whether alternative models could readily predict the positive correlation between m6A and nuclear localization and the negative correlations between<br /> m6A and mRNA stability. We assessed how nuclear decay might impact these associations by introducing nuclear decay as an additional rate, δ. We found that both associations were robust to this additional rate (Supplementary Figure 2a-c)."<br /> Based on the data, I would say that model 2 (m6A-dep + nuclear degradation) is better than model 1. The discussion of these findings in the Discussion could help clarify how to interpret this prediction. Is nuclear degradation playing a significant role, more than expected by previous studies?

      (2) The authors classify m6A levels as "low" or "high," and it is unclear how "low" differs from unmethylated mRNAs.

      (3) The authors explore whether m6A changes could be linked with differences in mRNA subcellular localization. They tested this hypothesis by looking at mRNA changes during heat stress, a complex scenario to predict with m6ADyn. According to the collected data, heat shock is not associated with dramatic changes in m6A levels. However, the authors observe a redistribution of m6A mRNAs during the treatment and recovery time, with highly methylated mRNAs getting retained in the nucleus being associated with a shorter half-life, and being transcriptional induced by HSF1. Based on this observation, the authors use m6Adyn to predict the contribution of RNA export, RNA degradation, and RNA transcription to the observed m6A changes. However:

      (a) Do the authors have a comparison of m6ADyn predictions based on the assumption that RNA export and RNA transcription may change at the same time?

      (b) They arbitrarily set the global reduction of export to 10%, but I'm not sure we can completely rule out whether m6A mRNAs have an export rate during heat shock similar to the non-methylated mRNAs. What happens if the authors simulate that the block in export could be preferential for m6A mRNAs only?

      (c) The dramatic increase in the nucleus: cytoplasmic ratio of mRNA upon heat stress may not reflect the overall m6A mRNA distribution upon heat stress. It would be interesting to repeat the same experiment in METTL3 KO cells. Of note, m6A mRNA granules have been observed within 30 minutes of heat shock. Thus, some m6A mRNAs may still be preferentially enriched in these granules for storage rather than being directly degraded. Overall, it would be interesting to understand the authors' position relative to previous studies of m6A during heat stress.

      (d) Gene Ontology analysis based on the top 1000 PC1 genes shows an enrichment of GOs involved in post-translational protein modification more than GOs involved in cellular response to stress, which is highlighted by the authors and used as justification to study RNA transcriptional events upon heat shock. How do the authors think that GOs involved in post-translational protein modification may contribute to the observed data?

      (e) Additionally, the authors first mention that there is no dramatic change in m6A levels upon heat shock, "subtle quantitative differences were apparent," but then mention a "systematic increase in m6A levels observed in heat stress". It is unclear to which systematic increase they are referring to. Are the authors referring to previous studies? It is confusing in the field what exactly is going on after heat stress. For instance, in some papers, a preferential increase of 5'UTR m6A has been proposed rather than a systematic and general increase.

    3. Reviewer #2 (Public review):

      Dierks et al. investigate the impact of m6A RNA modifications on the mRNA life cycle, exploring the links between transcription, cytoplasmic RNA degradation, and subcellular RNA localization. Using transcriptome-wide data and mechanistic modelling of RNA metabolism, the authors demonstrate that a simplified model of m6A primarily affecting cytoplasmic RNA stability is sufficient to explain the nuclear-cytoplasmic distribution of methylated RNAs and the dynamic changes in m6A levels upon perturbation. Based on multiple lines of evidence, they propose that passive mechanisms based on the restricted decay of methylated transcripts in the cytoplasm play a primary role in shaping condition-specific m6A patterns and m6A dynamics. The authors support their hypothesis with multiple large-scale datasets and targeted perturbation experiments. Overall, the authors present compelling evidence for their model which has the potential to explain and consolidate previous observations on different m6A functions, including m6A-mediated RNA export.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript works with a hypothesis where the overall m6A methylation levels in cells are influenced by mRNA metabolism (sub-cellular localization and decay). The basic assumption is that m6A causes mRNA decay and this happens in the cytoplasm. They go on to experimentally test their model to confirm its predictions. This is confirmed by sub-cellular fractionation experiments which show high m6A levels in the nuclear RNA. Nuclear localized RNAs have higher methylation. Using a heat shock model, they demonstrate that RNAs with increased nuclear localization or transcription, are methylated at higher levels. Their overall argument is that changes in m6A levels are rather determined by passive processes that are influenced by RNA processing/metabolism. However, it should be considered that erasers have their roles under specific environments (early embryos or germline) and are not modelled by the cell culture systems used here.

      Strengths:

      This is a thought-provoking series of experiments that challenge the idea that active mechanisms of recruitment or erasure are major determinants for m6A distribution and levels.

    1. eLife assessment

      The authors made a useful finding that Zizyphi spinosi semen, a traditional Chinese medicine, has demonstrated excellent biological activity and potential therapeutic effects against Alzheimer's disease (AD). The researchers presented the effects, but the research evidence for the mechanism was incomplete. The main claims were only partially supported.

    2. Reviewer #1 (Public review):

      Summary:

      The study shows that Zizyphi spinosi semen (ZSS), particularly its non-extracted simple crush powder, has significant therapeutic effects on neurodegenerative diseases. It removes Aβ, tau, and α-synuclein oligomers, restores synaptophysin levels, enhances BDNF expression and neurogenesis, and improves cognitive and motor functions in mouse AD, FTD, DLB, and PD models. Additionally, ZSS powder reduces DNA oxidation and cellular senescence in normal-aged mice, increases synaptophysin, BDNF, and neurogenesis, and enhances cognition to levels comparable to young mice.

      Weaknesses:

      (1) While the study demonstrates that ZSS has protective effects across a wide range of animal models, including AD, FTD, DLB, PD, and both young and aged mice, it is broad and lacks a detailed investigation into the underlying mechanisms. This is the most significant concern.

      (2) The authors highlight that the non-extracted simple crush powder of ZSS shows more substantial effects than its hot water extract and extraction residue. However, the manuscript provides very limited data comparing the effects of these three extracts.

      (3) The authors have not provided a rationale for the dosing concentrations used, nor have they tested the effects of the treatment in normal mice to verify its impact under physiological conditions.

      (4) Regarding the assessment of cognitive function in mice, the authors only utilized the Morris Water Maze (MWM) test, which includes a five-day spatial learning training phase followed by a probe trial. The authors focused solely on the learning phase. However, it is relevant to note that data from the learning phase primarily reflects the learning ability of the mice, while the probe trial is more indicative of memory. Therefore, it is essential that probe trial data be included for a more comprehensive analysis. A justification should be included to explain why the latency of 1st is about 50s not 60s.

      (5) The BDNF immunohistochemical staining in the manuscript appears to be non-specific.

      (6) The central pathological regions in PD are the substantia nigra and striatum. Please replace the staining results from the cortex and hippocampus with those from these regions in the PD model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors studied the effects of hot water extract, extraction residue, and non-extracted simple crush powder of ZSS in diseased or aged mice. It was found that ZSS played an anti-neurodegenerative role by removing toxic proteins, repairing damaged neurons, and inhibiting cell senescence.

      Strengths:

      The authors studied the effects of ZSS in different transgenic mice and analyzed the different states of ZSS and the effects of different components.

      Weaknesses:

      The authors' study lacked an in-depth exploration of mechanisms, including changes in intracellular signal transduction, drug targets, and drug toxicity detection.

    4. Reviewer #3 (Public review):

      ZSS has been widely used in Traditional Chinese Medicine as a sleep-promoting herb. This study tests the effects of ZSS powder and extracts on AD, PD, and aging, and broad protective effects were revealed in mice.

      However, this work did not include a mechanistic study or target data on ZSS were included, and PK data were also not involved. Mechanisms or targets and PK study are suggested. A human PK study is preferred over mice or rats. E.g. which main active ingredients and the concentration in plasma, in this context, to study the pharmacological mechanisms of ZSS.

    1. eLife assessment

      Utilizing transgenic lineage tracing techniques and tissue clearing-based advanced imaging and three-dimensional slices reconstruction, the authors comprehensively mapped the distribution atlas of NFATc1+ and PDGFR-α+ cells in dental and periodontal mesenchyme and tracked their in vivo fate trajectories. This important work extends our understanding of NFATc1+ and PDGFR-α+ cells in dental and periodontal mesenchyme homeostasis, and should provide impact on clinical application and investigation. The strength of this work is compelling in employing CRISPR/Cas9-mediated gene editing to generate two dual recombination systems, and mapped gNFATc1+ and PDGFR-α+cells residing in dental and periodontal mesenchyme, their capacity for progeny cell generation, and their inclusive, exclusive and hierarchical relations in homeostasis, generating a spatiotemporal atlas of these skeletal stem cell population.

    2. Reviewer #1 (Public Review):

      In this study, Yang et al. investigated the locations and hierarchies of NFATc1+ and PDGFRα+ cells in dental and periodontal mesenchyme. By combining intersectional and exclusive reporters, they attempted to distinguish among NFATc1+PDGFRα+, NFATc1+PDGFRα-, and NFATc1- PDGFRα+ cells. Using tissue clearing and serial section-based 3D reconstruction, they mapped the distribution atlas of these cell populations. Through DTA-induced ablation of PDGFRα+ cells, they demonstrated the crucial role of PDGFRα+ cells in the formation of the odontoblast cell layer and periodontal components.

      Main issues:

      (1) The authors did not quantify the contribution of PDGFRα+ cells or NFATc1+ cells to dental and periodontal lineages in PDGFRαCreER; Nfatc1DreER;LGRT mice. Zsgreen+ cells represented PDGFRα+ cells and their lineages. Tomato+ cells represented NFATc1+ cells and their lineages. Tomato+Zsgreen+ cells represented NFATc1+PDGFRα+ cells and their lineages. Conducting immunostaining experiments with lineage markers is essential to determine the physiological contributions of these cells to dental and periodontal homeostasis.

      (2) The authors attempted to use PDGFRαCreER; Nfatc1DreER;IR1 mice to illustrate the hierarchies of NFATc1+ and PDGFRα+ cells. According to the principle of the IR1 reporter, it requires sequential induction of PDGFRα-CreER and Nfatc1-DreER to investigate their genetic relationship. Upon induction by tamoxifen, NFATc1+PDGFRα- cells and NFATc1-PDGFRα+ cells were labeled by Tomato and Zsgreen, respectively. However, the reporter expression of NFATc1+PDGFRα+ cells was uncertain, most likely random. Therefore, the hierarchical relationship of NFATc1+ and PDGFRα+ cells cannot be reliably determined from PDGFRαCreER; Nfatc1DreER; IR1 mice.

    3. Reviewer #2 (Public Review):

      Summary:

      Yang et al. present an article investigating the spatiotemporal atlas of NFATc1+ and PDGFR-α+ cells within the dental and periodontal mesenchyme. The study explores their capacity for progeny cell generation and their relationships - both inclusive and hierarchical - under homeostatic conditions. Utilizing the Cre/loxP-Dre/Rox system to construct tool mice, combined with tissue transparency and continuous tissue slicing for 3D reconstruction, the researchers effectively mapped the distribution of NFATc1+ and PDGFR-α+ cells. Additionally, in conjunction with DTA mice, the study provides preliminary validation of the impact of PDGFR-α+ cells on dental pulp and periodontal tissues. Primarily, this study offers an in-situ distribution atlas for NFATc1+ and PDGFR-α+ cells but provides limited information regarding their origin, fate differentiation, and functionality.

      Strengths:

      (1) Tissue transparency techniques and continuous tissue slicing for 3D reconstruction, combined with transgenic mice, provide high-quality images and rich, reliable data.<br /> (2) The Cre/loxP and Dre/Rox systems used by the researchers are powerful and innovative.<br /> (3) The IR1 lineage tracing model is significantly important for investigating cellular differentiation pathways.<br /> (4) This study provides effective spatial distribution information of NFATc1+/PDGFR-α+ cell populations in the dental and periodontal tissues of adult mice.

      Weaknesses:

      (1) In the functional experiment section, the investigation into the role of NFATc1+/PDGFR-α+ cell populations is somewhat lacking.

      (2) The author mentions that 3D reconstruction of consecutive tissue slices can provide more detailed information on cell distribution, so what is the significance of using tissue-clearing techniques in this article?

      (3) After reading the entire article, it is confusing whether the purpose of the article is to explore the distribution and function of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues, or to compare the differences between tissue clearing techniques and 3D reconstruction of continuous histological slices using NFATc1+/PDGFR-α+ cells?

      (4) The researchers did not provide a clear definition of the cell types of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues.

      (5) In studies related to long bones, the author defines the NFATc1+/PDGFR-α+ cell population as SSCs, which as a stem cell group should play an important role in tooth development or injury repair. However, the distribution patterns and functions of the NFATc1+/PDGFR-α+ cell population in these two conditions have not been discussed in this study.

    4. Reviewer #3 (Public Review):

      Summary:

      This groundbreaking study provided the most advanced transgenic lineage tracing and advanced imaging techniques in deciphering dental/periodontal mesenchyme cells. In this study, authors utilized CRISPR/Cas9-mediated transgenic lineage tracing techniques to concurrently demonstrate the inclusive, exclusive, and hierarchical distributions of NFATc1+ and PDGFR-α+ cells and their lineage commitment in dental and periodontal mesenchyme.

      Strengths:

      In cooperating with tissue clearing-based advanced imaging and three-dimensional slices reconstruction, the distribution and hierarchical relationship of NFATc1+ and PDGFR-α+ cells and progeny cells plainly emerged, which undoubtedly broadens our understanding of their in vivo fate trajectories in craniomaxillofacial tissue. Also, the experiment design is comprehensive and well-executed, and the results are convincing and compelling.

      Weaknesses:

      Minor modifications could be made to the paper, including more details on the advantages of the methodology used by the authors in this study, compared to other studies.

    1. eLife assessment

      In this fundamental study, the authors describe a new data processing pipeline that can be used to discover causal interactions from time-lapse imaging data. The utility of this pipeline was convincingly illustrated using tumor-on-chip ecosystem data. The newly developed pipeline could be used to better understand cell-cell interactions and could also be applied to perform temporal causal discovery in other areas of science, meaning this work could potentially have a wide range of applications.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents a data processing pipeline to discover causal interactions from time-lapse imaging data, and convicingly illustrates it on a challenging application for the analysis of tumor-on-chip ecosystem data.

      The core of the discovery module is the original tMIIC method of the authors, which is shown in supplementary material to compare favourably to two state-of-the-art methods on synthetic temporal data on a 15 nodes network.

      Strengths:

      This paper tackles the problem of learning causal interactions from temporal data which is an open problem in presence of latent variables.

      The core of the method tMIIC of the authors is nicely presented in connection to Granger-Schreiber causality and to the novel graphical conditions used to infer latent variables and based on a theorem about transfer entropy.

      tMIIC compares favourably to PC and PCMCI+ methods using different kernels on synthetic datasets generated from a network of 15 nodes.

      A full application to tumor-on-chip cellular ecosystems data including cancer cells, immune cells, cancer-associated fibroblasts, endothelial cells and anti cancer drugs, with convincing inference results with respect to both known and novel effects between those components and their contact.

      The code and dataset are available online for the reproducibility of the results.

      Weaknesses:

      The references to "state-of-the-art methods" concerning the inference of causal networks should be more precise by giving citations in the main text, and better discussed in general terms, both in the first section and in the section of presentation of CausalXtract. It is only in the legend of the figures of the supplementary material that we get information.

      Of course, comparison on our own synthetic datasets can always be criticized but this is rather due to the absence of common benchmark and I would recommend the authors to explicitly propose their datasets as benchmark to the community.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose a methodology to perform causal (temporal) discovery. The approach appears to be robust and is tested in the different scenarios: one related with live-cell imaging data, and another one using synthetic (mathematically defined) time series data. They compare the performance of their findings against another well-know method by using metrics like F-score, precision and recall,

      Strengths:

      Performance, robustness, the text is clear and concise, The authors provide the code to review.

      Weaknesses:

      One concern could be the applicability of the method in other areas like climate, economy. For those areas, public data are available and might be interesting to test how the method performs with this kind of data.

    1. Reviewer #1 (Public Review):

      Summary:<br /> Both flies and mammals have D1-like and D2-like dopamine receptors, yet the role of D2-like receptors in Drosophila learning and memory remains underexplored. The paper by Qi et al. investigates the role of the D2-like dopamine receptor D2R in single pairs of dopaminergic neurons (DANs) during single-odor aversive learning in the Drosophila larva. First, they use confocal imaging to screen driver strains with expression in only single pairs of dopaminergic neurons. Next, they use thermogenetic manipulations of one pair of DANs (DAN-c1) to implicate DAN-c1 activity during larval aversive learning. They then use confocal imaging to demonstrate expression of D2R in the DANs and mushroom body of the larval brain. Finally, they show that optogenetic activation during training phenocopies D2R knockdown in these neurons: aversive learning is impaired when DAN-c1 is targeted, while appetitive and aversive learning are impaired when the mushroom body is manipulated. Qi et al. thus propose a model in which D2R limits excessive dopamine release to facilitate successful olfactory learning.

      Strengths:<br /> The paper reproduces prior findings by Qi and Lee (2014), which demonstrated that D2R knockdown in DL1 DANs or the mushroom body impairs aversive olfactory learning in Drosophila larvae. The authors extended this previous work by screening 57 GAL4 drivers to identify tools that drive expression in individual DANs and used one of the tools, the R76F02-AD; R55C10-DBD driver, to manipulate DAN-c1 neurons with greater specificity. They also show that GFP-tagged D2R is expressed in most DANs and the mushroom body. Although the authors only train larvae with a single odor, they demonstrate that driving D2R knockdown in DAN-c1 neurons impairs aversive learning, as do other loss-of-function manipulations of DAN-c1 neurons.

      Weaknesses:<br /> The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.<br /> Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.<br /> Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.<br /> The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.<br /> The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.<br /> Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.<br /> The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons. Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).<br /> Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

    1. eLife assessment

      This valuable study characterizes the variability in spacing and direction of entorhinal grid cells and shows how this variability can be used to disambiguate locations within an environment. These claims are supported by solid evidence, yet some aspects of the methodology should be clarified. This study will be of interest to neuroscientists working on spatial navigation and, more generally, on neural coding.

    2. Reviewer #1 (Public review):

      Summary:

      The present paper by Redman et al. investigated the variability of grid cell properties in the MEC by analyzing publicly available large-scale neural recording data. Although previous studies have proposed that grid spacing and orientation are homogeneous within the same grid module, the authors found a small but robust variability in grid spacing and orientation across grid cells in the same module. The authors also showed, through model simulations, that such variability is useful for decoding spatial position.

      Strengths:

      The results of this study provide novel and intriguing insights into how grid cells compose the cognitive map in the axis of the entorhinal cortex and hippocampus. This study analyzes large data sets in an appropriate manner and the results are solid.

      Weaknesses:

      A weakness of this paper is that the scope of the study may be somewhat narrow, as this study focused only on the variability of spacing and orientation across grid cells. I would suggest some additional analysis or discussion that might increase the value of the paper.

      (1) Is the variability in grid spacing and orientation that the authors found intrinsically organized or is it shaped by experience? Previous research has shown that grid representations can be modified through experience (e.g., Boccara et al., Science 2019). To understand the dynamics of the network, it would be important to investigate whether robust variability exists from the beginning of the task period (recording period) or whether variability emerges in an experience-dependent manner within a session.

      (2) It is important to consider the optimal variability size. The larger the variability, the better it is for decoding. On the other hand, as the authors state in the Discussion, it is assumed that variability does not exist in the continuous attractor model. Although this study describes that it does not address how such variability fits the attractor theory, it would be better if more detailed ideas and suggestions were provided as to what direction the study could take to clarify the optimal size of variability.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents an interesting and useful analysis of grid cell heterogeneity, showing that the experimentally observed heterogeneity of spacing and orientation within a grid cell module can allow more accurate decoding of location from a single module.

      Strengths:

      I found the statistical analysis of the grid cell variability to be very systematic and convincing. I also found the evidence for enhanced decoding of location based on between-cell variability within a module to be convincing and important, supporting their conclusions.

      Weaknesses:

      (1) Even though theoreticians might have gotten the mistaken impression that grid cells are highly regular, this might be due to an overemphasis on regularity in a subset of papers. Most experimentalists working with grid cells know that many if not most grid cells show high variability of firing fields within a single neuron, though this analysis focuses on between neurons. In response to this comment, the reviewers should tone down and modify their statements about what are the current assumptions of the field (and if possible provide a short supplemental section with direct quotes from various papers that have made these assumptions).

      (2) The authors state that "no characterization of the degree and robustness of variability in grid properties within individual modules has been performed." It is always dangerous to speak in absolute terms about what has been done in scientific studies. It is true that few studies have had the number of grid cells necessary to make comparisons within and between modules, but many studies have clearly shown the distribution of spacing in neuronal data (e.g. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Hardcastle et al., 2015) so the variability has been visible in the data presentations. Also, most researchers in the field are well aware that highly consistent grid cells are much rarer than messy grid cells that have unevenly spaced firing fields. This doesn't hurt the importance of the paper, but they need to tone down their statements about the lack of previous awareness of variability (specific locations are noted in the specific comments).

      (3) The methods section needs to have a separate subheading entitled: How grid cells were assigned to modules" that clearly describes how the grid cells were assigned to a module (i.e. was this done by Gardner et al., or done as part of this paper's post-processing?

    4. Reviewer #3 (Public review):

      Summary:

      Redman and colleagues analyze grid cell data obtained from public databases. They show that there is significant variability in spacing and orientation within a module. They show that the difference in spacing and orientation for a pair of cells is larger than the one obtained for two independent maps of the same cell. They speculate that this variability could be useful to disambiguate the rat position if only information from a single module is used by a decoder.

      Strengths:

      The strengths of this work lie in its conciseness, clarity, and the potential significance of its findings for the grid cell community, which has largely overlooked this issue for the past two decades. Their hypothesis is well stated and the analyses are solid.

      Weaknesses:

      On the side of weaknesses, we identified two aspects of concern. First, alternative explanations for the main result exist that should be explored and ruled out. Second, the authors' speculation about the benefits of variability in angle and spacing for spatial coding is not particularly convincing, although this issue does not diminish the importance or impact of the results.

      Major comments:

      (1) One possible explanation of the dispersion in lambda (not in theta) could be variability in the typical width of the field. For a fixed spacing, wider fields might push the six fields around the center of the autocorrelogram toward the outside, depending on the details of how exactly the position of these fields is calculated. We recommend authors show that lambda does not correlate with field width, or at least that the variability explained by field width is smaller than the overall lambda variability.

      (2) An alternative explanation could be related to what happens at the borders. The authors tackle this issue in Figure S2 but introduce a different way of measuring lambda based on three fields, which in our view is not optimal. We recommend showing that the dispersions in lambda and theta remain invariant as one removes the border-most part of the maps but estimating lambda through the autocorrelogram of the remaining part of the map. Of course, there is a limit to how much can be removed before measures of lambda and theta become very noisy.

      (3) A third possibility is slightly more tricky. Some works (for example Kropff et al, 2015) have shown that fields anticipate the rat position, so every time the rat traverses them they appear slightly displaced opposite to the direction of movement. The amount of displacement depends on the velocity. Maps that we construct out of a whole session should be deformed in a perfectly symmetric way if rats traverse fields in all directions and speeds. However, if the cell is conjunctive, we would expect a deformation mainly along the cell's preferred head direction. Since conjunctive cells have all possible preferred directions, and many grid cells are not conjunctive at all, this phenomenon could create variability in theta and lambda that is not a legitimate one but rather associated with the way we pool data to construct maps. To rule away this possibility, we recommend the authors study the variability in theta and lambda of conjunctive vs non-conjunctive grid cells. If the authors suspect that this phenomenon could explain part of their results, they should also take into account the findings of Gerlei and colleagues (2020) from the Nolan lab, that add complexity to this issue.

      (4) The results in Figure 6 are correct, but we are not convinced by the argument. The fact that grid cells fire in the same way in different parts of the environment and in different environments is what gives them their appeal as a platform for path integration since displacement can be calculated independently of the location of the animal. Losing this universal platform is, in our view, too much of a price to pay when the only gain is the possibility of decoding position from a single module (or non-adjacent modules) which, as the authors discuss, is probably never the case. Besides, similar disambiguation of positions within the environment would come for free by adding to the decoding algorithm spatial cells (non-hexagonal but spatially stable), which are ubiquitous across the entorhinal cortex. Thus, it seems to us that - at least along this line of argumentation - with variability the network is losing a lot but not gaining much.

      (5) In Figure 4 one axis has markedly lower variability. Is this always the same axis? Can the authors comment more on this finding?

      (6) The paper would gain in depth if maps coming out of different computational models could be analyzed in the same way.

      (7) Similarly, it would be very interesting to expand the study with some other data to understand if between-cell delta_theta and delta_lambda are invariant across environments. In a related matter, is there a correlation between delta_theta (delta_lambda) for the first vs for the second half of the session? We expect there should be a significant correlation, it would be nice to show it.

    5. Author response:

      We thank the reviewers for their time and thoughtful comments. We are encouraged that all reviewers found our work novel and clear. We will submit a full revision to address all the points the reviewers made. Below, we briefly highlight a few clarifications and planned analyses to address major concerns; all other concerns raised by the reviewers will also be addressed in the revision.

      Reviewers #1 and #3 asked whether the variability in grid properties emerged with experience/time in the environment. We agree that this is an interesting question, and we will re-analyze the data to explore whether between-cell variability increases with time within a session. However, we note that since the rats were already familiarized to the environment for 10-20 sessions prior to the recordings, there may be limited additional changes in between-cell variability between recording sessions. In one case, two sessions from the same rat were recorded on consecutive days (R11/R12 and R21/R22) - these sessions did not show any difference in variability. 

      Reviewer #2 noted that the variability in grid properties is known to experimentalists. We will tone down our discussion on the current assumptions in the field to accurately reflect this awareness in the community. However, we would like to emphasize that the lack of work carefully examining the robustness of this variability has prevented a firm understanding of whether this is an inherent property of grid cells or due to noise. The impact of this can be seen in theoretical neuroscience work where a considerable number of articles (including recent publications) start with the assumption that all grid cells within a module have identical properties, with the exception of phase shift and noise. In addition, since grid cells are assumed to be identical in the computational neuroscience community, there has been little work on quantifying how much variability a given model produces. This makes it challenging to understand how consistent different models are with our observations. We believe that making these limitations of previous work clear is important to properly conveying our work’s contribution. 

      Reviewer #3 asked whether the variability in grid properties could be driven by cells that were conjunctively tuned with head direction. We agree that this is an interesting hypothesis and will explore this by performing new analysis. We note that, as reported by Gardner et al. (2022), only 19 of the 168 cells in recording session R12 are conjunctive. Even if these cells are included in the same proportion as pure grid cells by our inclusion criteria (which appears unlikely, given that conjunctive cells may be less reliable across splits of the data), then approximately 9 out of the 82 cells we analyzed would be conjunctive. Therefore, we expect it to be unlikely that they are the main source of the variability we find. However, we will test this in our revised manuscript.

      Reviewer #3 asked whether the “price” paid in having grid property variability was too high for the modest gain in ability to encode local space. We agree that losing the continuous attractor network (CAN) structure, and the ability to path integrate, would be a very large loss. However, we do not believe that the variability we observe necessarily destroys either CAN or path integration. We argue this for two reasons. First, the data we analyzed [from Gardner et al. (2022)] is exactly the data set that was found to have toroidal topology and therefore viewed to be in line with a major prediction of CANs. Thus, the amount of variability in grid properties does not rule out the underlying presence of a continuous attractor. Second, path integration may still be possible with grid cells that have variable properties. To illustrate this, and to address another comment from Reviewer #3, we have begun to analyze the distribution of grid properties in a recurrent neural network (RNN) model trained to perform path integration (Sorscher et al., 2019). This RNN model, in addition to others (Banino et al., 2018; Cueva and Wei, 2018), has been found to develop grid cells and there is evidence that it develops CANs as the underlying circuit mechanism (Sorscher et al., 2023). We find that the grid cells that emerge from this model exhibit variability in their grid spacings and orientations. This illustrates that path integration (the very task the RNN was trained to perform) is possible using grid cells with variable properties.

    1. eLife assessment

      In this useful study, the authors show that N-acetylation of synuclein increases clustering of synaptic vesicles in vitro and that this effect is mediated by enhanced interaction with lysophosphatidylcholine. While the evidence for enhanced clustering is largely solid, the biological significance remains unclear.

    2. Reviewer #1 (Public review):

      ⍺-synuclein (syn) is a critical protein involved in many aspects of human health and disease. Previous studies have demonstrated that post-translational modifications (PTMs) play an important role in regulating the structural dynamics of syn. However, how post-translational modifications regulate syn function remains unclear. In this manuscript, Wang et al. reported an exciting discovery that N-acetylation of syn enhances the clustering of synaptic vesicles (SVs) through its interaction with lysophosphatidylcholine (LPC). Using an array of biochemical reconstitution, single vesicle imaging, and structural approaches, the authors uncovered that N-acetylation caused distinct oligomerization of syn in the presence of LPC, which is directly related to the level of SV clustering. This work provides novel insights into the regulation of synaptic transmission by syn and might also shed light on new ways to control neurological disorders caused by syn mutations.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors provide evidence that posttranslational modification of synuclein by N-acetylation increases clustering of synaptic vesicles in vitro. When using liposomes the authors found that while clustering is enhanced by the presence of either lysophosphatidylcholine (LPC) or phosphatidylcholine in the membrane, N-acetylation enhanced clustering only in the presence of LPC. Enhancement of binding was also observed when LPC micelles were used, which was corroborated by increased intra/intermolecular cross-linking of N-acetylated synuclein in the presence of LPC.

      Strengths:

      It is known for many years that synuclein binds to synaptic vesicles but the physiological role of this interaction is still debated. The strength of this manuscript is clearly in the structural characterization of the interaction of synuclein and lipids (involving NMR-spectroscopy) showing that the N-terminal 100 residues of synuclein are involved in LPC-interaction, and the demonstration that N-acetylation enhances the interaction between synuclein and LPC.

      Weaknesses:

      Lysophosphatides form detergent-like micelles that destabilize membranes, with their steady-state concentrations in native membranes generally being a lot lower than in the experiments reported here. Since no difference in binding between the N-acetylated and unmodified form was observed when the acidic phospholipid phosphatidylserine was included. It remains unclear to which extent binding to LPC is physiologically relevant, particularly in the light of recent reports from other laboratories showing that synuclein may interact with liquid-liquid phases of synapsin I, or associate with the unfolded regions of VAMP that both were reported to cause vesicle clustering.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      ⍺-synuclein (syn) is a critical protein involved in many aspects of human health and disease. Previous studies have demonstrated that post-translational modifications (PTMs) play an important role in regulating the structural dynamics of syn. However, how post-translational modifications regulate syn function remains unclear. In this manuscript, Wang et al. reported an exciting discovery that N-acetylation of syn enhances the clustering of synaptic vesicles (SVs) through its interaction with lysophosphatidylcholine (LPC). Using an array of biochemical reconstitution, single vesicle imaging, and structural approaches, the authors uncovered that N-acetylation caused distinct oligomerization of syn in the presence of LPC, which is directly related to the level of SV clustering. This work provides novel insights into the regulation of synaptic transmission by syn and might also shed light on new ways to control neurological disorders caused by syn mutations.

      We thank the reviewer for appreciating the importance of our work and his/her positive comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors employed DLS to quantify the percentage of SV clustering in Fig. 1c and d. As DLS usually measures particle size distribution, I am not sure how the data was plotted in Fig. 1c and d. It would be great to show a representative raw dataset here.

      We thank the reviewer for the comment. To address this, we have put four representative DLS datasets of different α-Syn variants mediating SV clustering for clarification (Author response image 1). Rather than presenting the particle distribution based on the light scattering intensity, DLS can also convert the intensity to present the data as particle size distribution based on the particle number counts. In our analysis, particle diameters around 50 nm are considered to represent single SV species, whereas diameters larger than 120 nm indicate SV clusters. Specifically, as shown in Author response image 1, adding Ac-α-syn to a homogeneous SV sample altered the distribution from one single SV particle species (Author response image 1d) to three distinct species (Author response image 1a); this resulted in 68.5% of the particles being single SVs and 31.5% being SV clusters.

      Author response image 1.

      Representative raw dataset of α-Syn-mediated synaptic vesicle (SV) clustering monitored by dynamic light scattering (DLS). The grey-colored rows represent small particles (< 5 nm) that contributed zero to the particle number count.

      (2) Syn-lipid interactions are known to be altered by mutations involved in neurodegenerative diseases. I am wondering how those mutations will affect SV clustering mediated by the interaction of LPC with N-acetylated syn.

      We thank the reviewer for the insightful comment. Our data indicate that N-acetylation enhances the binding of the N-terminal region of α-syn to LPC, thereby facilitating SV clustering. This enhancement benefits from the fact that N-acetylation effectively neutralizes the positive charge of α-syn’s N-terminal region, promoting its insertion into LPC-rich membranes through hydrophobic interactions. Therefore, we envision that any mutation that weakens membrane binding capability of the N-terminal unmodified α-Syn may decrease SV clustering mediated by the interaction between the Ac-α-syn and LPC.

      In a separated work (doi: 10.1093/nsr/nwae182, Fig. S8), we compared the binding affinity of LPC with wild-type N-terminal un-modified α-syn and six Parkinson’s disease (PD) familial mutants (A30P, E46K, H50Q, G51D, A53E, and A53T). Among these, only the A30P mutation showed a significant decrease in binding with LPC. Furthermore, using the same single vesicle assay setup, in another paper (doi: 10.1073/pnas.2310174120, Fig. 4C), we demonstrated that the A30P-mutated α-Syn lost its ability to facilitate SV clusters. Therefore, among the six PD mutations, the A30P mutation may significantly impact the SV clustering mediated by Ac-α-syn LPC interaction.

      (3) The crosslinking data in Fig. 4 was obtained using LPC or PS liposomes. I am wondering if these results truly mimic physiological conditions. Could the authors use SVs for these experiments?

      We thank the reviewer for the suggestion. To elucidate the mechanistic differences between N-terminal unmodified α-syn and N-acetylated α-syn, we utilized pure LPC and PS liposomes for clarity. If using natural source SVs, which contain many synaptic proteins, could complicate or obscure the interaction patterns of Ac-α-syn due to potential crosstalk with other SV proteins. Additionally, the complex lipid environment of SV membranes would not help us decipher the specific molecular mechanism by which Ac-α-Syn facilitates SV clustering through LPC.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors provide evidence that posttranslational modification of synuclein by N-acetylation increases clustering of synaptic vesicles in vitro. When using liposomes the authors found that while clustering is enhanced by the presence of either lysophosphatidylcholine (LPC) or phosphatidylcholine in the membrane, N-acetylation enhanced clustering only in the presence of LPC. Enhancement of binding was also observed when LPC micelles were used, which was corroborated by increased intra/intermolecular cross-linking of N-acetylated synuclein in the presence of LPC.

      Strengths:

      It is known for many years that synuclein binds to synaptic vesicles but the physiological role of this interaction is still debated. The strength of this manuscript is clearly in the structural characterization of the interaction of synuclein and lipids (involving NMR-spectroscopy) showing that the N-terminal 100 residues of synuclein are involved in LPC-interaction, and the demonstration that N-acetylation enhances the interaction between synuclein and LPC.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      Lysophosphatides form detergent-like micelles that destabilize membranes, with their steady-state concentrations in native membranes being low, questioning the significance of the findings. Oddly, no difference in binding between the N-acetylated and unmodified form was observed when the acidic phospholipid phosphatidylserine was included. It remains unclear to which extent binding to LPC is physiologically relevant, particularly in the light of recent reports from other laboratories showing that synuclein may interact with liquid-liquid phases of synapsin I that were reported to cause vesicle clustering.

      We appreciate the reviewers’ insightful comments. Indeed, in another paper (doi: 10.1093/nr/nwae182), employing conventional α-Syn pull-down assay and LC-MS lipidomics method, we found that α-Syn has a preference for binding to lysophospholipids across in vivo and in vitro systems. Additionally, by comparing the lipid compositions of mouse brains, SVs and SV lipid-raft membranes, we found LPC levels to be twice as high in SVs compared to brain homogenates, and twice as high in lipid-raft membranes compared to non-lipid-raft membranes. Altogether, these findings emphasize the physiological relevance of understanding the mechanism by which Ac-α-syn mediated SV clustering through LPC.

      Liquid-liquid phase separation has been implicated in the assembly and maintenance of SV clusters, and we believe that the SV cluster liquid phase is interconnected by highly abundant proteins with multivalent low-affinity interactions. Besides the previously discovered protein-protein interactions between α-Syn and synapsin (doi: 10.1016/j.jmb.2021.166961) or VAMP2 (doi: 10.1038/s41556-024-01456-1) that contribute to SV condensates, protein-lipid interactions between α-Syn and acidic phospholipids or LPC may also play a role. Furthermore, post-translational modifications, such as N-acetylation of α-Syn, may also contribute to SV condensates.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 2, the authors indicate that for the binding assay both vesicle populations, the immobilized "acceptor" and the superfused "donor" population were labeled with different fluorescent dyes whereas in the text it is stated that the immobilized acceptor liposomes were unlabeled. Please clarify. Moreover, a control is missing showing that binding indeed depends on the immobilised liposome fraction and does not occur in their absence. This control is important because due to the long incubation times non-specific adsorption may occur which may be enhanced by adding destabilizing LPC or charged PS to the membrane.

      We thank the reviewer for pointing out this inconsistency. To avoid signal leakage from a high concentration of DiD vesicles upon green laser irradiation, we immobilized unlabeled vesicles. We have revised the Figure 2a as well as the figure caption.

      Regarding the control mentioned by the reviewer, we agree with the reviewer that non-specific binding could occur with the long incubation. In fact, a layer of highly dense liposomes (100 μM) immobilized on the imaging surface is also for reducing non-specific interactions. In the absence of this layer of immobilized liposomes, we did see a high level of non-specific binding that significantly impacted our experiments. Therefore, we need to perform clustering experiments in the presence of immobilized liposomes.

    1. eLife assessment

      The manuscript introduces an important and innovative non-AI computational method for segmenting noisy grayscale images, with a particular focus on identifying immunostained potassium ion channel clusters. This method significantly enhances accuracy over basic threshold-based techniques while remaining user-friendly and accessible, even for researchers with limited computational backgrounds. The evidence supporting the method's efficacy is convincing. Its practical application and ease of use make it a tool that will benefit a wide range of laboratories.

    2. Reviewer #1 (Public review):

      The manuscript introduces a valuable and innovative non-AI computational method for segmenting noisy grayscale images, with a particular focus on identifying immunostained potassium ion channel clusters.

      Strengths:

      (1) Applicability and Usability: The method is exceptionally accessible to biologists and researchers without advanced computational expertise. It offers a highly practical alternative to AI-based methods, which often require significant training data and computational resources, making it an excellent choice for a broader range of laboratories.

      (2) Proof-of-Concept: The manuscript provides compelling evidence through multiple experiments, showcasing the method's superior performance over traditional threshold-based techniques, particularly in noisy environments. The dual immuno-electron microscopy experiments further reinforce the robustness and effectiveness of this approach.

      (3) Clarity and Methodology: The manuscript is exceptionally well-written, with clear and concise descriptions that effectively highlight the method's advantages. The detailed figures and comprehensive references greatly enhance the manuscript's credibility and strongly support the claims made.

      Weaknesses:

      The manuscript does not include comparisons with more advanced segmentation techniques, particularly those based on artificial intelligence. While the authors have provided a rationale for this decision, including such comparisons could have enriched the discussion and offered additional insights. Additionally, there are some concerns about the computational demands of the method, especially when applied to large-scale or 3D image analysis. Although the authors have shared some computational data, further optimization or practical recommendations would enhance the method's utility. Initially, the manuscript lacked a data and code availability statement, which could have limited the method's accessibility. However, this issue has since been resolved, with the code now being made available to the community. Lastly, while the findings related to Kv4.2 in the thalamus are noteworthy, they might achieve even greater impact if presented in a separate paper. Nevertheless, the authors have chosen to retain these results within the current manuscript to strengthen the overall narrative and relevance.

      We appreciate that the authors have provided thorough explanations for their original choices. These justifications offer a clearer understanding of their approach and the reasons behind the presentation of the data.

      Conclusion:

      The revised manuscript successfully addresses the majority of the reviewers' concerns, presenting a strong case for the proposed segmentation method. The method's ease of use for non-experts in AI, combined with its proven effectiveness in proof-of-concept experiments, positions it as a valuable addition to the field. While the manuscript could benefit from incorporating comparisons with more advanced segmentation methods and offering a more detailed discussion of computational requirements, it remains a robust contribution. The decision to include the Kv4.2 findings within the paper is well-justified by the authors, though these results could potentially have an even greater impact if published separately.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by David et al. describes a novel image segmentation method, implementing Local Moran's method, which determines whether the value of a datapoint or a pixel is randomly distributed among all values, in differentiating pixel clusters from the background noise. The study includes several proof-of-concept analyses to validate the power of the new approach, revealing that implementation of Local Moran's method in image segmentation is superior to threshold-based segmentation methods commonly used in analyzing confocal images in neuroanatomical studies.

      Strengths:

      Several proof-of-concept experiments are performed to confirm the sensitivity and validity of the proposed method. Using composed images with varying levels of background noise and analyzing them in parallel with the Local Moran's or a Threshold-Based Method (TBM), the study is able to compare these approaches directly and reveal their relative power in isolating clustered pixels.

      Similarly, dual immuno-electron microscopy was used to test the biological relevance of a colocalization that was revealed by Local Moran's segmentation approach on dual-fluorescent labeled tissue using immuno-markers of the axon terminal and a membrane-protein (Figure 5). The EM revealed that the two markers were present in terminals and their post-synaptic partners, respectively. This is a strong approach to verify the validity of the new approach for determining object-based colocalization in fluorescent microscopy.

      The methods section is clear in explaining the rationale and the steps of the new method (however, see the weaknesses section). Figures are appropriate and effective in illustrating the methods and the results of the study. The writing is clear; the references are appropriate and useful.

      Weaknesses:

      While the steps of the mathematical calculations to implement Local Moran's principles for analyzing high-resolution images are clearly written, the manuscript currently does not provide a computation tool that could facilitate easy implementation of the method by other researchers. Without a user-friendly tool, such as an ImageJ plugin or a code, the use of the method developed by David et al by other investigators may remain limited.

      This weakness is eliminated in the revision, which now provides the approach as a Matlab tool.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study describes a new computational method for unsupervised (i.e., non-artificial intelligence) segmentation of objects in grayscale images that contain substantial noise, to differentiate object, no object, and noise. Such a problem is essential in biology because they are commonly confronted in the analysis of microscope images of biological samples and recently have been resolved by artificial intelligence, especially by deep neural networks. However, training artificial intelligence for specific sample images is a difficult task and not every biological laboratory can handle it. Therefore, the proposed method is particularly appealing to laboratories with little computational background. The method was shown to achieve better performance than a threshold-based method for artificial and natural test images. To demonstrate the usability, the authors applied the method to high-power confocal images of the thalamus for the identification and quantification of immunostained potassium ion channel clusters formed in the proximity of large axons in the thalamic neuropil and verified the results in comparison to electron micrographs.

      Strengths:

      The authors claim that the proposed method has higher pixel-wise accuracy than the threshold-based method when applied to gray-scale images with substantial noises.

      Since the method does not use artificial intelligence, training and testing are not necessary, which would be appealing to biologists who are not familiar with machine learning technology.

      The method does not require extensive tuning of adjustable parameters (trying different values of "Moran's order") given that the size of the object in question can be estimated in advance.

      We appreciate the positive assessment of our approach.

      Weaknesses:

      It is understood that the strength of the method is that it does not depend on artificial intelligence and therefore the authors wanted to compare the performance with another non-AI method (i.e. the threshold-based method; TBM). However, the TBM used in this work seems too naive to be fairly compared to the expensive computation of "Moran's I" used for the proposed method. To provide convincing evidence that the proposed method advances object segmentation technology and can be used practically in various fields, it should be compared to other advanced methods, including AI-based ones, as well.

      Protein localization studies revealed that protein distributions are frequently inhomogeneous in a cell. This is very common in neurons which are highly polarized cell types with distinct axo-somato-dendritic functions. Moreover, due to the nature of the cell-to-cell interactions among neurons (e.g. electrical and chemical synapses) the cell membrane includes highly variable microdomains with unique protein assemblies (i.e. clusters). Protein clusters are defined as membrane segments with higher protein densities compared to neighboring membrane regions. However, protein density can continuously change between “clusters” and “non-clusters”. As a consequence, differentiating proteins involved vs not involved in clusters is a challenging task.  Indeed, our analysis showed that the boundaries of protein clusters varied remarkably when 23 human experts delineated them.

      Despite the fact the protein clusters can only be vaguely defined numerous studies have demonstrated the functional relevance of inhomogeneous protein distribution. Thus, there is a high relevance and need for an observer independent, “operative” segmentation method that can be accomplished and compared among different conditions and specimens. The strength of the Moran’s I analysis we propose here, as pointed out by our reviewers and editors, is that it can extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

      In AI based analysis the ground truth is known by an observer and using a large training set AI learns to extract the relevant information for image segmentation. As outlined above the “ground truth”, however, cannot be unequivocally defined for protein clusters. There is no doubt, that with sufficient resource investment there would be an AI based analysis of the same problem. In our view, however, in an average laboratory setting generating a training set using hundreds of images examined by many experts may not be plausible. Moreover, generalization of one training set to another set of cluster, resistance to noise or different levels of background could also not be guaranteed.

      This method was claimed to be better than the TBM when the noise level was high. Related to the above, TBMs can be used in association with various denoising methods as a preprocess. It is questionable whether the claim is still valid when compared to the methods with adequate complexity used together with denoising. Consider for example, Weigert et al. (2018) https://doi.org/10.1038/s41592-018-0216-7; or Lehtinen et al (2018) https://doi.org/10.48550/arXiv.1803.04189.

      In Weigert et al. AI was trained with high-quality images of the same object obtained with extreme photon exposure in confocal microscope. As delineated above without training AI systems cannot be used for such purposes. The Lehtinen paper is unfortunately no longer available at this doi.

      We must emphasize that in our work we did not intend to compare the image segmentation method based on local Moran’s I with all other available segmentation techniques. Rather we wanted to demonstrate a straightforward method of grouping pixels with similar intensities and in spatial proximity which does not require a priori knowledge of the objects. We used TBM to benchmark the method. We agree that with more advanced TBM methods the difference between Moran’s and TBM might have been smaller. The critical component here is, however, that even with most advanced TBM an artificial threshold is needed to be defined. The optimal threshold may change from sample to sample depending on the experimental conditions which makes quantification questionable. Moran’s method overcomes this problem and allows more objective segmentation of images even if the exact conditions (background labeling, noise, intensity etc) are not identical among the samples.

      The computational complexity of the method, determined by the convolution matrix size (Moran's order), linearly increases as the object size increases (Fig. S2b). Given that the convolution must be run separately for each pixel, the computation seems quite demanding for scale-up, e.g. when the method is applied for 3D image volumes. It will be helpful if the requirement for computer resources and time is provided.

      Here we provide the required data concerning the hardware and the computational time:

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Author response table.

      Computation times:

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by David et al. describes a novel image segmentation method, implementing Local Moran's method, which determines whether the value of a datapoint or a pixel is randomly distributed among all values, in differentiating pixel clusters from the background noise. The study includes several proof-of-concept analyses to validate the power of the new approach, revealing that implementation of Local Moran's method in image segmentation is superior to threshold-based segmentation methods commonly used in analyzing confocal images in neuroanatomical studies.

      Strengths:

      Several proof-of-concept experiments are performed to confirm the sensitivity and validity of the proposed method. Using composed images with varying levels of background noise and analyzing them in parallel with the Local Moran's or a Threshold-Based Method (TBM), the study is able to compare these approaches directly and reveal their relative power in isolating clustered pixels.     

      Similarly, dual immuno-electron microscopy was used to test the biological relevance of a colocalization that was revealed by Local Moran's segmentation approach on dual-fluorescent labeled tissue using immuno-markers of the axon terminal and a membrane-protein (Figure 5). The EM revealed that the two markers were present in terminals and their post-synaptic partners, respectively. This is a strong approach to verify the validity of the new approach for determining object-based colocalization in fluorescent microscopy. 

      The methods section is clear in explaining the rationale and the steps of the new method (however, see the weaknesses section). Figures are appropriate and effective in illustrating the methods and the results of the study. The writing is clear; the references are appropriate and useful.

      We are grateful for the constructive assessment of our results.

      Weaknesses:

      While the steps of the mathematical calculations to implement Local Moran's principles for analyzing high-resolution images are clearly written, the manuscript currently does not provide a computation tool that could facilitate easy implementation of the method by other researchers. Without a user-friendly tool, such as an ImageJ plugin or a code, the use of the method developed by David et al by other investigators may remain limited.

      The code for the analysis is now available online as a user-friendly MATLAB script at: https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m

      Recommendations for the authors:

      Summary of reviews:

      Both reviewers acknowledge the potential significance and practicality of the newly proposed image segmentation method. This method uses Local Moran's principles, offering an advantage over traditional intensity thresholding approaches by providing more sensitivity, particularly in reducing background noise and preserving biologically relevant pixels.

      Strengths Highlighted:

      • The proposed method can provide more accurate results, especially for grayscale images with significant noise.

      • The method is not dependent on artificial intelligence, making it appealing for researchers with minimal computational background.    

      • The approach can operate without the need for extensive tuning, given that the size of the object is known.

      • Several proof-of-concept experiments were carried out, revealing the effectiveness of the method in comparison with the threshold-based segmentation methods.

      • The manuscript is clear in terms of methodology, and the results are supported by effective illustrations and references.

      Weaknesses Noted:

      • The study lacked a comparative analysis with advanced segmentation methods, especially those that employ artificial intelligence.

      See our response above to the same question of Reviewer 1.

      • There are concerns about computational complexity, especially when dealing with larger data sets or 3D image volumes.

      See our response about the calculations of computation times above to the similar question of Reviewer 1.

      • Both reviewers noted the absence of a data/code availability statement in the manuscript, which might restrict the method's adoption by other researchers.

      The code availability is provided now.

      • Reviewer 2 suggested that some results, particularly related to Kv4.2 in the thalamus, might be better presented in a separate study due to their significance.

      We thank our reviewers for this suggestion. We carefully evaluated the pros and cons of publishing the Kv4.2 data separately. We finally decided to keep the segmentation and experimental data together due to the following reason. We believe that the ultrastructural localization provides strong experimental proof for the relevance of our novel segmentation method. In order to make the potassium channel data more visible we added a subsentence to the title. In this manner we think scientist interested in the imaging method as well as the neurobiology will be both find and cite the paper. The novel title reads now:

      “An image segmentation method based on the spatial correlation coefficient of Local Moran’s I - identification of A-type potassium channel clusters in the thalamus.”

      Reviewer Recommendations:

      (1) Provide details about the data and program code availability.

      See our response above

      (2) Offer practical recommendations and provide clarity on software packages and coding for the proposed method to enhance its adoption.

      Done.

      (3) Consider presenting the findings about Kv4.2 in the thalamus separately as they hold significant importance on their own.

      See our response above

      Given the reviews, the proposed image segmentation method presents a promising advancement in the domain of image analysis. The technique offers tangible benefits, especially for researchers dealing with biological microscopy data. However, for this method to see a broader application, it's imperative to provide clearer practical guidance and make data or code easily accessible. Additionally, while the findings regarding Kv4.2 in the thalamus are intriguing, they might achieve more impact if detailed in a dedicated paper.

      Reviewer #1 (Recommendations For The Authors):

      The availability of data or program code was not stated in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) While the principles of the method are explained clearly in a step-by-step fashion in the Methods section, the practical aspects of running sequential computations over a large matrix of pixel values are not well described. It would be very useful if the authors could provide recommendations on how to set the data structure and clarify which software and programming package for Local Moran's analysis they used. In addition, providing the code for the sequential implementation described in the Methods section would facilitate the adoption of the method by other researchers, and thus, the impact of the study. Currently, there is no data or code availability statement included in the manuscript.

      See our response above.

      (2) Figure 4 illustrates an experiment in which transmission electron microscopy and freeze-fracture replica labeling approaches were used to demonstrate that a potassium channel marker, Kv4.2 was selective to synapses forming on larger caliber dendrites in the thalamus. As impressive as the EM approaches utilized in this figure are, the results of this experiment have a somewhat tangential bearing on the segmentation method that is the focus of this study. In fact, the experiments illustrated in Figure 5, dual immuno-EM, are more than sufficient to confirm what the dual-confocal imaging coupled with Local Moran's segmentation analysis reveals. Furthermore, the author's findings about the localization and selectivity of Kv4.2 in the thalamus are too important and exciting to bury in a paper focusing on the methodology. Those results may have a wider impact if they are presented and discussed in a separate experimental paper.

      See our response above

    1. eLife assessment

      This useful experiment seeks to better understand how memory interacts with incoming visual information to effectively guide human behavior. Using several methods, the authors identify two distinct pathways relating visual processing to the default mode network: one that emphasizes semantic cognition, and the other, spatial cognition. The evidence presented is solid and will be of interest to cognitive and systems neuroscientists.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gonzalez Alam et al. sought to understand how memory interacts with incoming visual information to effectively guide human behavior by using a task that combines spatial contexts (houses) with objects of one or more other semantic categories. Three additional datasets (all from separate participants) were also employed: one that functionally localized regions of interest (ROIs) based on subtractions of different visually presented category types (in this case, scenes, objects, and scrambled objects); another consisting of resting-state functional connectivity scans, and a section of the Human Connectome Project that employed DTI data for structural connectivity analysis. Across multiple analyses, the authors identify dissociations between regions preferentially activated during scene or other object judgments, between the functional connectivity of regions demonstrating such preferences, and in the anatomical connectivity of these same regions. The authors conclude that the processing streams that take in visual information and support semantic or spatial processing are largely parallel and distinct.

      Strengths:

      (1) Recent work has reconceptualized the classic default mode network as parallel and interdigitated systems (e.g., Braga & Buckner, 2017; DiNicola et al., 2021). The current manuscript is timely in that it attempts to describe how information is differentially processed by two streams that appear to begin in visual cortex and connect to different default subnetworks. Even at a group level where neuroanatomy is necessarily blurred across individuals, these results provide clear evidence of stimulus-based processing dissociation.

      (2) The manuscript analyzes data from multiple independent datasets. It is therefore unlikely that a single experimenter choice in any given analysis would spuriously produce the general convergence of the results reported in this manuscript.

      Weaknesses:

      (1) The manuscript makes strong distinctions between spatial processing and other forms of semantic processing. However, it is not clear if scenes are uniquely different from other stimulus categories, such as faces or tools. As is noted by the authors in their revised discussion section, the design of the experiment does not allow for a category-level generalization beyond scenes. The dichotomization of semantic and spatial information invoked throughout the manuscript should be read with this limitation in mind.

      (2) Although the term "objects" is used by the authors to refer to the stimuli placed in scenes, it is a mixture of other stimulus categories, including various types of animals, tools, and other manmade objects. Different regions along the ventral stream are thought to process these different types of stimuli (e.g., Martin, 2007, Ann Rev Psychol), but as they are not being modeled separately, the responses associated with "object" processing in this manuscript are necessarily blurring across known distinctions in functional neuroanatomy.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Gonzalez Alam et al. report a series of functional MRI results about the neural processing from the visual cortex to high-order regions in the default-mode network (DMN), compiling evidence from task-based functional MRI, resting-state connectivity, and diffusionweighted imaging. Their participants were first trained to learn the association between objects and rooms/buildings in a virtual reality experiment; after the training was completed, in the task-based MRI experiment, participants viewed the objects from the earlier training session and judged if the objects were in the semantic category (semantic task) or if they were previously shown in the same spatial context (spatial context task). Based on the task data, the authors utilised resting-state data from their previous studies, visual localiser data also from previous studies, as well as structural connectivity data from the Human Connectome Project, to perform various seed-based connectivity analysis. They found that the semantic task causes more activation of various regions involved in object perception while the spatial context task causes more activation in various regions for place perception, respectively. They further showed that those object perception regions are more connected with the frontotemporal subnetwork of the DMN while those place perception regions are more connected with the medial-temporal subnetwork of the DMN. Based on these results, the authors argue that there are two main pathways connecting the visual system to highlevel regions in the DMN, one linking object perception regions (e.g., LOC) leading to semantic regions (e.g., IFG, pMTG), the other linking place perception regions (e.g., parahippocampal gyri) to the entorhinal cortex and hippocampus.

      Below I provide my takes on (1) the significance of the findings and the strength of evidence, (2) my guidance for readers regarding how to interpret the data, as well as several caveats that apply to their results, and finally (3) my suggestions for the authors.

      (1) Significance of the results and strength of the evidence

      I would like to praise the authors for, first of all, trying to associate visual processing with high-order regions in the DMN. While many vision scientists focus specifically on the macroscale organisation of the visual cortex, relatively few efforts are made to unravel how neural processing in the visual system goes on to engage representations in regions higher up in the hierarchy (a nice precedent study that looks at this issue is by Konkle and Caramazza, 2017). We all know that visual processing goes beyond the visual cortex, potentially further into the DMN, but there's no direct evidence. So, in this regard, the authors made a nice try to look at this issue.

      We thank the reviewer for their positive feedback and for their very thoughtful and thorough comments, which have helped us to improve the quality of the paper.

      Having said this, the authors' characterisation of the organisation of the visual cortex (object perception/semantics vs. place perception/spatial contexts) does not go beyond what has been known for many decades by vision neuroscience. Specifically, over the past two decades, numerous proposals have been put forward to explain the macroscale organisation of the visual system, particularly the ventrolateral occipitotemporal cortex. A lateral-medial division has been reliably found in numerous studies. For example, some researchers found that the visual cortex is organised along the separation of foveal vision (lateral) vs. peripheral vision (medial), while others found that it is structured according to faces (lateral) vs. places (medial). Such a bipartite division is also found in animate (lateral) vs. inanimate (medial), small objects (lateral) vs. big objects (medial), as well as various cytoarchitectonic and connectomic differences between the medial side and the lateral side of the visual cortex. Some more recent studies even demonstrate a tripartite division (small objects, animals, big objects; see Konkle and Caramazza, 2013). So, in terms of their characterisation of the visual cortex, I think Gonzalez Alam et al. do not add any novel evidence to what the community of neuroscience has already known.

      The aim of our study was not to provide novel evidence about visual organisation, but rather to understand how these well-known visual subdivisions are related to functional divisions in memory-related systems, like the DMN. We agree that our study confirms the pattern observed by numerous other studies in visual neuroscience.  

      However, the authors' effort to link visual processing with various regions of the DMN is certainly novel, and their attempt to gather converging evidence with different methodologies is commendable. The authors are able to show that, in an independent sample of restingstate data, object-related regions are more connected with semantic regions in the DMN while place-related regions are more connected with navigation-related regions in the DMN, respectively. Such patterns reveal a consistent spatial overlap with their Kanwisher-type face/house localiser data and also concur with the HCP white-matter tractography data. Overall, I think the two pathways explanation that the authors seek to argue is backed by converging evidence. The lack of travelling wave type of analysis to show the spatiotemporal dynamics across the cortex from the visual cortex to high-level regions is disappointing though because I was expecting this type of analysis would provide the most convincing evidence of a 'pathway' going from one point to another. Dynamic caudal modelling or Granger causality may also buttress the authors' claim of pathway because many readers, like me, would feel that there is not enough evidence to convincingly prove the existence of a 'pathway'.

      By ‘pathway’ we are referring to a pattern of differential connectivity between subregions of visual cortex and subregions of DMN, suggesting there are at least two distinct routes between visual and heteromodal regions. However, these routes don’t have to reflect a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. We have now clarified this in the discussion section of the manuscript. We agree it would be interesting to characterise the spatiotemporal dynamics of neural propagation along our pathways, and we have incorporated this proposal into the future directions section.

      “One important caveat is that we have not investigated the spatiotemporal dynamics of neural propagation along the pathways we identified between visual cortex and DMN. The dissociations we found in task responses, intrinsic functional connectivity and white matter connections all support the view that there are at least two distinct routes between visual and heteromodal DMN regions, yet this does not necessarily imply that there is a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. It would be interesting in future work to characterise the spatiotemporal dynamics of neural propagation along visualDMN pathways using methods optimised for studying the dynamics of information transmission, like Granger causality or travelling wave analysis.”

      We have also edited the wording of sentences in the introduction and discussion that we thought might imply directionality or transmission of information along these pathways, or to clarify the nature of the pathways (please see a couple of examples below):

      In the Introduction:

      “We identified dissociable pathways of connectivity between from different parts of visual cortex to and DMN subsystems “

      In the Discussion:

      “…pathways from visual cortex to DMN -> …pathways between visual cortex and DMN“.

      (2) Guidance to the readers about interpretation of the data

      The organisation of the visual cortex and the organisation of the DMN historically have been studied in parallel with little crosstalk between different communities of researchers. Thus, the work by Gonzalez Alam et al. has made a nice attempt to look at how visual processing goes beyond the realm of the visual cortex and continues into different subregions of the DMN.

      While the authors of this study have utilised multiple methods to obtain converging evidence, there are several important caveats in the interpretation of their results:

      (1) While the authors choose to use the term 'pathway' to call the inter-dependence between a set of visual regions and default-mode regions, their results have not convincingly demonstrated a definitive route of neural processing or travelling. Instead, the findings reveal a set of DMN regions are functionally more connected with object-related regions compared to place-related regions. The results are very much dependent on masking and thresholding, and the patterns can change drastically if different masks or thresholds are used.

      We would like to qualify that our findings do not only reveal a set of any “DMN regions that are functionally more connected with object-related regions compared to place-related regions”. Instead, we show a double dissociation based on our functional task responses: DMN regions that were more responsive to semantic decisions about objects are more functionally and structurally connected to visual regions more activated by perceiving objects, while DMN regions that were more responsive to spatial decisions are more connected to visual regions activated by the contrast of scene over object perception.

      We do not believe that the thresholding or masking involved in generating seeds strongly affected our results. We are reassured of this by two facts:

      (1) We re-analysed the resting-state data using a stricter clustering threshold and this did not change the pattern of results (see response below).

      (2) In response to a point by reviewer #2, we re-analysed the data eroding the masks of the MT-DMN, and this also didn’t change the pattern of results (please see response to reviewer 2).

      In this way, our results are robust to variations in mask shape/size and thresholding.

      (2) Ideally, if the authors could demonstrate the dynamics between the visual cortex and DMN in the primary task data, it would be very convincing evidence for characterising the journey from the visual cortex to DMN. Instead, the current connectivity results are derived from a separate set of resting state data. While the advantage of the authors' approach is that they are able to verify certain visual regions are more connected with certain DMN regions even under a task-free situation, it falls short of explaining how these regions dynamically interact to convert vision into semantic/spatial decision.

      We agree that a valuable future direction would be to collect evidence of spatiotemporal dynamics of propagation of information along these pathways. This could be the focus of future studies designed to this aim, and we have suggested this in the manuscript based on the reviewer’s suggestion. Furthermore, as stated above, we have now qualified our use of the term ‘pathway’ in the manuscript to avoid confusion.

      “These pathways refer to regions that are coupled, functionally or structurally, together, providing the potential for communication between them.”

      (3) There are several results that are difficult to interpret, such as their psychophysiological interactions (PPI), representational similarity analysis, and gradient analysis. For example, typically for PPI analysis, researchers interrogate the whole brain to look for PPI connectivity. Their use of targeted ROI is unusual, and their use of spatially extensive clusters that encompass fairly large cortical zones in both occipital and temporal lobes as the PPI seeds is also an unusual approach. As for the gradient analysis, the argument that the semantic task is higher on Gradient 1 than the spatial task based on the statistics of p-value = 0.027 is not a very convincing claim (unhelpfully, the figure on the top just shows quite a few blue 'spatial dots' on the hetero-modal end which can make readers wonder if the spatial context task is really closer to the unimodal end or it is simply the authors' statistical luck that they get a p-value under 0.05). While it is statistically significant, it is weak evidence (and it is not pertinent to the main points the authors try to make).

      To streamline the manuscript, we have moved the PPI and RSA results to the

      Supplementary Materials. However, we believe the gradient analysis is highly pertinent to understanding the functional separation of these pathways. In the revision, we show that not only was the contrast between the Semantic and Spatial tasks significant, in addition, the majority of participants exhibited a pattern consistent with the result we report. To show the results more clearly, we have added a supplementary figure (Figure S8) focussed on comparisons at the participant level.

      This figure shows the position in the gradient for each peak per participant per task. The peaks for each participant across tasks are linked with a line. Cases where the pattern was reversed are highlighted with dashed lines (7/27 participants in each gradient). This allows the reader and reviewer to verify in how many cases, at the individual level, the pattern of results reported in the text held (see “Supplementary Analysis: Individual Location of pathways in whole-brain gradients”).  

      (3) My suggestion for the authors

      There are several conceptual-level suggestions that I would like to offer to the authors:

      (1)  If the pathway explanation is the key argument that you wish to convey to the readers, an effective connectivity type of analysis, such as Granger causality or dynamic caudal modelling, would be helpful in revealing there is a starting point and end point in the pathway as well as revealing the directionality of neural processing. While both of these methods have their issues (e.g., Granger causality is not suitable for haemodynamic data, DCM's selection of seeds is susceptible to bias, etc), they can help you get started to test if the path during task performance does exist. Alternatively, travelling wave type of analysis (such as the results by Raut et al. 2021 published in Science Advances) can also be useful to support your claims of the pathway.

      As we have stated above, we agree with the reviewer that, given the pattern of results obtained in our work, analyses that characterise the spatiotemporal dynamics of transmission of information along the pathways would be of interest. However, we are concerned that our data is not well-optimised for these analyses.

      (2)  I think the thresholding for resting state data needs to be explained - by the look of Figure 2E and 3E, it looks like whole-brain un-thresholded results, and then you went on to compute the conjunction between these un-thresholded maps with network templates of the visual system and DMN. This does not seem statistically acceptable, and I wonder if the conjunction that you found would disappear and reappear if you used different thresholds. Thus, for example, if the left IFG cluster (which you have shown to be connected with the visual object regions) would disappear when you apply a conventional threshold, this means that you need to seriously consider the robustness of the pathway that you seek to claim... it may be just a wild goose that you are chasing.

      We believe the reviewer might be confused regarding the procedure we followed to generate the ROIs used in the pathways connectivity analysis. As stated in the last paragraph of the “Probe phase” and “Decision phase” results subsections, the maps the reviewer is referring to (Fig. 3E, for example) were generated by seeding the intersection of our thresholded univariate analysis (Fig. 3A) with network templates. In the case of Fig 3E, these are the Semantic>Spatial decision results after thresholding, intersected with Yeo DMN (MT, FT and Core, combined). These seeds were then entered into a whole-brain seed-based spatial correlation analysis, which was thresholded and cluster-corrected using the defaults of CONN. The same is true for Fig. 2E, but using the thresholded Probe phase

      Semantic>Context regions. Thus, we do not believe the objections to statistical rigour the reviewer is raising apply to our results.

      The thresholding of the resting-state data itself was explained in the Methods (Spatial Maps and Seed-to-ROI Analysis). As stated above, we thresholded using the default of the CONN software package we used (cluster-forming threshold of p=.05, equivalent to T=1.65). For increased rigour, we reproduced the thresholded maps from Figs 2E and 3E further increasing the threshold from p=.05, equivalent to T=1.65, to p=.001, equivalent to T=3.1. The resulting maps were very similar, showing minimal change with a spatial correlation of r > .99 between the strict and lax threshold versions of the maps for both the probe and decision seeds. This can be seen in Figure 2E and Figure 33E, which depict the maps produced with stricter thresholding. These maps can also be downloaded from the Neurovault collection, and the re-analysis is now reported in the Supplementary Materials (see section “Supplementary Analysis: Resting-state maps with stricter thresholding”) Probe phase (compare with Fig. 2E):

      (3) There are several analyses that are hard to interpret and you can consider only reporting them in the supplementary materials, such as the PPI results and representational similarity analysis, as none of these are convincing. These analyses do not seem to add much value to make your argument more convincing and may elicit more methodological critiques, such as statistical issues, the set-up of your representational theory matrix, and so on.

      We have moved the PPI and RSA results to the supplementary materials. We agree this will help us streamline the manuscript.  

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Alam et al. sought to understand how memory interacts with incoming visual information to effectively guide human behavior by using a task that combines spatial contexts (houses) with objects of one or multiple semantic categories. Three additional datasets (all from separate participants) were also employed: one that functionally localized regions of interest (ROIs) based on subtractions of different visually presented category types (in this case, scenes, objects, and scrambled objects); another consisting of restingstate functional connectivity scans, and a section of the Human Connectome Project that employed DTI data for structural connectivity analysis. Across multiple analyses, the authors identify dissociations between regions preferentially activated during scene or object judgments, between the functional connectivity of regions demonstrating such preferences, and in the anatomical connectivity of these same regions. The authors conclude that the processing streams that take in visual information and support semantic or spatial processing are largely parallel and distinct.

      Strengths:

      (1) Recent work has reconceptualized the classic default mode network as two parallel and interdigitated systems (e.g., Braga & Buckner, 2017; DiNicola et al., 2021). The current manuscript is timely in that it attempts to describe how information is differentially processed by two streams that appear to begin in visual cortex and connect to different default subnetworks. Even at a group level where neuroanatomy is necessarily blurred across individuals, these results provide clear evidence of stimulus-based dissociation.

      (2) The manuscript contains a large number of analyses across multiple independent datasets. It is therefore unlikely that a single experimenter choice in any given analysis would spuriously produce the overall pattern of results reported in this work.

      We thank the reviewer for their remarks on the strengths of our manuscript.

      Weaknesses:

      (1) Throughout the manuscript, a strong distinction is drawn between semantic and spatial processing. However, given that only objects and spatial contexts were employed in the primary experiment, it is not clear that a broader conceptual distinction is warranted between "semantic" and "spatial" cognition. There are multiple grounds for concern regarding this basic premise of the manuscript.

      a. One can have conceptual knowledge of different types of scenes or spatial contexts. A city street will consistently differ from a beach in predictable ways, and a kitchen context provides different expectations than a living room. Such distinctions reflect semantic knowledge of scene-related concepts, but in the present work spatial and "all other" semantic information are considered and discussed as distinct and separate.

      The “building” contexts we created were arbitrary, containing beds, desks and an assortment of furniture that did not reflect usual room distributions, i.e., a kitchen next to a dining room. We have made this aspect of our stimuli clearer in the Materials section of the task. 

      “The learning phase employed videos showing a walk-through for twelve different buildings (one per video), shot from a first-person perspective. The videos and buildings were created using an interior design program (Sweet Home 3D). Each building consisted of two rooms: a bedroom and a living room/office, with an ajar door connecting the two rooms. The order of the rooms (1st and 2nd) was counterbalanced across participants. Each room was distinctive, with different wallpaper/wall colour and furniture arrangements. The building contexts created by these rooms were arbitrary, containing furniture that did not reflect usual room distributions (i.e., a kitchen next to a dining room), to avoid engaging further conceptual knowledge about frequently-encountered spatial contexts in the real world.”

      To help the reviewer and readers to verify this and come to their own conclusions, we have also added the videos watched by the participants to the OSF collection.

      “A full list of pictures of the object and location stimuli employed in this task, as well as the videos watched by the participants can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training. “

      We agree that scenes or spatial contexts have conceptual characteristics, and we actually manipulated conceptual information about the buildings in our task, in order to assess the neural underpinnings of this effect. In half of the buildings, the rooms/contexts were linked through the presence of items that shared a common semantic category (our “same category building” condition): this presented some conceptual scaffolding that enabled participants to link two rooms together. These buildings could then be contrasted with “mixed category buildings” where this conceptual link between rooms was not available. We found that right angular gyrus was important in the linking together of conceptual and spatial information, in the contrast of same versus mixed category buildings.

      b. As a related question, are scenes uniquely different from all other types of semantic/category information? If faces were used instead of scenes, could one expect to see different regions of the visual cortex coupling with task-defined face > object ROIs? The current data do not speak to this possibility, but as written the manuscript suggests that all (non-spatial) semantic knowledge should be processed by the FT-DMN.

      Thanks for raising this important point. Previous work suggests that the human visual system (and possibly the memory system, as suggested by Deen and Freiwald, 2021) is sensitive to perceptual categories important to human behaviour, including spatial, object and social information. Previous work (Silson et al., 2019; Steel et al., 2021) has shown domain-specific regions in visual regions (ventral temporal cortex; VTC) whose topological organisation is replicated in memory regions in medial parietal cortex (MPC) for faces and places. In these studies, adding objects to the analyses revealed regions sensitive to this category sandwiched between those responsive to people and places in VTC, but not in MPC. However, consistent with our work, the authors find regions sensitive to memory tasks for places and objects (as well as people) in the lateral surface of the brain. 

      Our study was not designed to probe every category in the human visual system, and therefore we cannot say what would happen if we contrasted social judgments about faces with semantic judgments about objects. We have added this point as a limitation and future direction for research:

      “Likewise, further research should be carried out on memory-visual interactions for alternative domains. Our study focused on spatial location and semantic object processing and therefore cannot address how other categories of stimuli, such as faces, are processed by the visual-tomemory pathways that we have identified. Previous work has suggested some overlap in the neurobiological mechanisms for semantic and social processing (Andrews-Hanna et al., 2014; Andrews-Hanna & Grilli, 2021; Chiou et al., 2020), suggesting that the FT-DMN pathway may be highlighted when contrasting both social faces and semantic objects with spatial scenes. On the other hand, some researchers have argued for a ‘third pathway’ for aspects of social visual cognition (Pitcher & Ungerleider, 2021; Pitcher, 2023). Future studies that probe other categories will be able to confirm the generality (or specificity) of the pathways we described.”

      c. Recent precision fMRI studies characterizing networks corresponding to the FT-DMN and MTL-DMN have associated the former with social cognition and the latter with scene construction/spatial processing (DiNicola et al., 2020; 2021; 2023). This is only briefly mentioned by the authors in the current manuscript (p. 28), and when discussed, the authors draw a distinction between semantic and social or emotional "codes" when noting that future work is necessary to support the generality of the current claims. However, if generality is a concern, then emphasizing the distinction between object-centric and spatial cognition, rather than semantic and spatial cognition, would represent a more conservative and bettersupported theoretical point in the current manuscript.

      We appreciate this comment and we have spent quite a bit of time considering what the most appropriate terminology would be. The distinction between object and spatial cognition is largely appropriate to our probe phase, although we feel this label is still misleading for two reasons:

      First, we used a range of items from different semantic categories, not just “objects”, although we have used that term as a shorthand to refer to the picture stimuli we presented. The stimuli include both animals (land animals, marine animals and birds) and man-made objects (tools, musical instruments and sports equipment). This category information is now more prominent in the rationale (Introduction) and the Methods to avoid confusion.

      Interested readers can also review our “object” stimuli in the OSF collection associated with this manuscript:

      Introduction: “…participants learned about virtual environments (buildings) populated with objects belonging to different, heterogeneous, semantic categories, both man-made (tools, musical instruments, sports equipment) and natural (land animals, marine animals, birds).”

      Methods:

      “A full list of pictures of the object and location stimuli employed in this task can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training.”

      Secondly, we manipulated the task demands so that participants were making semantic judgments about whether two items were in the same category, or spatial judgments about whether two rooms had been presented in the same building. Our use of the terms “semantic” and “spatial” was largely guided by the tasks that participants were asked to perform.

      We have revised the terminology used in the discussion to reflect this more conservative term. However, since the task performed was semantic in nature (participants had to judge whether items belonged to semantic categories), we have modified the term proposed by the reviewer to “object-centric semantics”, which we hope will avoid confusion.  

      (2) Both the retrosplenial/parieto-occipital sulcus and parahippocampal regions are adjacent to the visual network as defined using the Yeo et al. atlas, and spatial smoothness of the data could be impacting connectivity metrics here in a way that qualitatively differs from the (non-adjacent) FT-DMN ROIs. Although this proximity is a basic property of network locations on the cortical surface, the authors have several tools at their disposal that could be employed to help rule out this possibility. They might, for instance, reduce the smoothing in their multi-echo data, as the current 5 mm kernel is larger than the kernel used in Experiment 2's single-echo resting-state data. Spatial smoothing is less necessary in multiecho data, as thermal noise can be attenuated by averaging over time (echoes) instead of space (see Gonzalez-Castillo et al., 2016 for discussion). Some multi-echo users have eschewed explicit spatial smoothing entirely (e.g., Ramot et al., 2021), just as the authors of the current paper did for their RSA analysis. Less smoothing of E1 data, combined with a local erosion of either the MTL-DMN and VIS masks (or both) near their points of overlap in the RSFC data, would improve confidence that the current results are not driven, at least in part, by spatial mixing of otherwise distinct network signals.

      A: The proximity of visual peripheral and DMN-C networks is a property of these networks’ organisation (Silson et al., 2019; Steel et al., 2021), and we agree the potential for spatial mixing of the signal due to this adjacency is a valid concern. Altering the smoothing kernel of the multi-echo data would not address this issue though, since no connectivity analyses were performed in task data. The reviewer is right about the kernel size for task data (5mm), but not about the single echo RS data, which actually has lower spatial resolution (6mm). 

      Since this objection is largely about the connectivity analysis, we re-analysed the RS data by shrinking the size of the visual probe and DMN decision ROIs for the context task using fslmaths. We eroded the masks until the smallest gap between them exceeded the size of our 6mm FWHM smoothing kernel, which eliminates the potential for spatial mixing of signals due to ROI adjacency. The eroded ROIs can be consulted in the OSF collection associated with this project (see component “ROI Analysis/Revision_ErodedMasks”. The results, presented in the supplementary materials as “Eroded masks replication analysis”, confirmed the pattern of findings reported in the manuscript (see SM analysis below). We did not erode the respective ROIs for the semantic task, given that adjacency is not an issue there. 

      “Eroded masks replication analysis:

      The Visual-to-DMN ANOVA showed main effects of seed (F(1,190)=22.82, p<.001), ROI (F(1,190)=9.48, p=.002) and a seed by ROI interaction (F(1,190)=67.02, p<.001). Post-hoc contrasts confirmed there was stronger connectivity between object probe regions and semantic versus spatial context decision regions (t(190)=3.38, p<.001), and between scene probe regions and spatial context versus semantic decision regions (t(190)=-7.66, p<.001).

      The DMN-to-Visual ANOVA confirmed this pattern: again, there was a main effect of ROI (F(1,190)=4.3, p=.039) and a seed by ROI interaction (F(1,190)=57.59, p<.001), with posthoc contrasts confirming stronger intrinsic connectivity between DMN regions implicated in semantic decisions and object probe regions (t(190)=5.06, p<.001), and between DMN regions engaged by spatial context decisions and scene probe regions (t(190)=3.25, p=.001).”

      (3) The authors identify a region of the right angular gyrus as demonstrating a "potential role in integrating the visual-to-DMN pathways." This would seem to imply that lesion damage to right AG should produce difficulties in integrating "semantic" and "spatial" knowledge. Are the authors aware of such a literature? If so, this would be an important point to make in the manuscript as it would tie in yet another independent source of information relevant to the framework being presented. The closest of which I am aware involves deficits in cued recall performance when associates consisted of auditory-visual pairings (Ben-Zvi et al., 2015), but that form of multi-modal pairing is distinct from the "spatial-semantic" integration forwarded in the current manuscript.

      This is a very interesting observation. There is a body of literature pointing to AG (more often left than right) as an integrator of multimodal information: It has been shown to integrate semantic and episodic memory, contextual information and cross-modality content.

      The Contextual Integration Model (Ramanan et al., 2017) proposes that AG plays a crucial role in multimodal integration to build context. Within this model, information that is essential for the representation of rich, detailed recollection and construction (like who, when, and, crucially for our findings, what and where) is processed elsewhere, but integrated and represented in the AG. In line with this view, Bonnici et al (2016) found AG engagement during retrieval of multimodal episodic memories, and that multivariate classifiers could differentiate multimodal memories in AG, while unimodal memories were represented in their respective sensory areas only. Recent work examining semantic processing in temporallyextended narratives using multivariate approaches concurs with a key role of left AG in context integration (Branzi et al., 2020).

      In addition to context integration, other lines of work suggest a role of AG as an integrator across modalities, more specifically. Recent perspectives suggest a role of AG as a dynamic buffer that allows combining distinct forms of information into multimodal representations (Humphreys et al., 2021), which is consistent with the result in our study of a region that brings together semantic and spatial representations in line with task demands. Others have proposed a role of the AG as a central connector hub that links three semantic subsystems, including multimodal experiential representation (Xu et al., 2017). Causal evidence of the role of AG in integrating multimodal features has been provided by Yazar et al (2017), who studied participants performing memory judgements of visual objects embedded in scenes, where the name of the object was presented auditorily. TMS to AG impaired participants’ ability to retrieve context features across multiple modalities. However, these studies do not single out specifically right AG.

      Some recent proposals suggest a causal role of right AG as a key region in the early definition of a context for the purpose of sensemaking, for which integrating semantic information with many other modalities, including vision, may be a crucial part (Seghier, 2023). TMS studies suggest a causal role for the right AG in visual attention across space

      (Olk et al. 2015, Petitet et al. 2015), including visual search and the binding of stimulus- and response-characteristics that can optimise it (Bocca et al. 2015). TMS over the right AG disrupts the ability to search for a target defined by a conjunction of features (Muggleton et al. 2008) and affects decision-making when visuospatial attention is required (Studer et al. 2014). This suggests that the AG might contribute to perceptual decision-making by guiding attention to relevant information in the visual environment (Studer et al. 2014). These, taken together, suggest a causal role of right AG in controlling attention across space and integrating content across modalities in order to search for relevant information. 

      Most of this body of research points to left, rather than right, AG as a key region for integration, but we found regions of right AG to be important when semantic and spatial information could be integrated. We might have observed involvement of the right AG in our study, as opposed to the more-often reported left, given that people have to integrate semantic information with spatial context, which relies heavily on visuospatial processes predominantly located in right hemisphere regions (cf. Sormaz et al., 2017), which might be more strongly connected to right than left AG. 

      Lastly, we are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. We have added as a recommendation that patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration. We have added the following to the discussion:

      “We found a region of the right AG that was potentially important for integrating semantic and spatial context information. Previous research has established a key role of the AG in context integration (Ramanan et al., 2017; Bonnici et al., 2016; Branzi et al., 2020) and specifically, in guiding multimodal decisions and behaviour (Humphreys et al., 2021; Xu et al., 2017; Yazar et al., 2017). Although some recent proposals suggest a causal role of right AG in the early establishment of meaningful contexts, allowing semantic integration across modalities (Seghier, 2023; Olk et al., 2015, Petitet et al., 2015; Bocca et al., 2015; Muggleton et al. 2008), the majority of this research points to left, rather than right, AG as a key region for integration. However, we might have observed involvement of the right AG in our study given that people were integrating semantic information with spatial context, and right-lateralised visuospatial processes (cf. Sormaz et al., 2017) might be more strongly connected to right than left AG. We are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. Patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) I mentioned the numerous converging analyses reported in this manuscript as a strength. However, in practice, it also makes results in numerous dense figures (routinely hitting 7-8 sub-panels) and results paragraphs which, as currently presented, are internally coherent but are not assembled into a "bigger picture" until the discussion. Readers may have an easier time with the paper if introductions to the different analyses ("probe phase", "decision phase", etc.) also include a bigger-picture summary of how the specific analysis is contributing to the larger argument that is being constructed throughout the manuscript. This may also help readers to understand why so many different analysis approaches and decisions were employed throughout the manuscript, why so many different masks were used, etc.

      Thank you for this suggestion. We agree that the range of analyses and their presentation can make digesting them difficult. To address this, we have outlined our analyses rationale at the beginning of the results as a sort of “big picture” summary that links all analyses together, and added introductory paragraphs to each analysis that needed them (namely, the probe, decision, and pathway connectivity analyses, as the gradient and integration analyses already had introductory paragraphs describing their rationale, and the PPI/RSA analyses were moved to supplementary materials), linking them to the summary, which we reproduce below:

      “To probe the organisation of streams of information between visual cortex and DMN, our neuroimaging analysis strategy consisted of a combination of task-based and connectivity approaches. We first delineated the regions in visual cortex that are engaged by the viewing of probes during our task (Figure 2), as well as the DMN regions that respond when making decisions about those probes (Figure 3): we characterised both by comparing the activation maps with well-established DMN and object/scene perception regions, analysed the pattern of activation within them, their functional connectivity and task associations. Having characterised the two ends of the stream, we proceeded to ask whether they are differentially linked: are the regions activated by object probe perception more strongly linked to DMN regions that are activated when making semantic decisions about object probes, relative to other DMN regions? Is the same true for the spatial context probe and decision regions? We answered this question through a series of connectivity analyses (Figure 4) that examined: 1) if the functional connectivity of visual to DMN regions (and DMN to visual regions) showed a dissociation, suggesting there are object semantic and spatial cognition processing ‘pathways’; 2) if this pattern was replicated in structural connectivity; 3) if it was present at the level of individual participants, and, 4) we characterised the spatial layout, network composition (using influential RS networks) and cognitive decoding of these pathways. Having found dissociable pathways for semantic (object) and spatial context (scene) processing, we then examined their position in a high-dimensional connectivity space (Figure 5) that allowed us to document that the semantic pathway is less reliant on unimodal regions (i.e., more abstract) while the spatial context pathway is more allied to the visual system. Finally, we used uni- and multivariate approaches to examine how integration between these pathways takes place when semantic and spatial information is aligned (Figure 6).”

      (2) At various points, figures are arranged out of sequence (e.g., panel d is referenced after panel g in Figure 2) or are missing descriptions of what certain colors mean (e.g., what yellow represents in Figure 6d). This is a minor issue, but one that's important and easy to address in future revisions.

      We thank the reviewer for bringing this issue to our attention. We have added descriptions for the yellow colour to the figure legends of Figures 6 and 7 (now in supplementary materials, Figure S9).

      We have also edited the text to follow a logical sequence with respect to referencing the panels in Figures 2 and 3, where panel d is now referenced after panel c. Lastly, we reorganised the layout of Figure 4 to follow the description of the results in the text.

    1. eLife assessment

      This important study shows a significant role for Mushashi-2 (Msi2) in lung adenocarcinoma. The authors provided solid data that support the requirement for Msi2 in tumor growth and progression, although the study would have been strengthened by including more patient samples and additional evidence regarding Msi2+ cells being more responsive to transformation. These findings are of interest to both the lung cancer and the RNA binding protein fields.

    2. Reviewer #1 (Public Review):

      Summary:

      Here, the authors, Barber AG et al, developed a new mouse model and investigated an importance of Musashi-2 in lung cancer. Specifically, they found that Musashi-2 is important for lung cancer cells as it controls cancer cell growth, and also regulates several genes that also control cancer cell growth. Development of a new Musashi-2 mouse model is a plus, which confirmed Musashi-2 importance for lung cancer survival, and finding several genes that Musashi controls that are important for lung cancer growth. Additionally, they demonstrated that Musashi-2 overexpression which is tracked by GFP is preferred in lung adenocarcinoma cells. The data is rigorous and only minor revisions are requested.

      Strengths:

      Authors achieved their goals, by developing new Musashi-2 mouse model, confirming Musashi-2 importance for lung cancer survival, and finding several genes that Musashi controls that are important for lung cancer growth.

      Weaknesses:

      The findings of Musashi-2 mouse and human lung cancer growth control are not that novel as prior publication in 2016 showed that already, again, in both human and mouse models (Kudinov et al PNAS, PMID: 27274057), and also the authors missed the point of that paper which did use both miuse and human models to show impact on inbvasion and metastasis- both in vitro and in vivo. Additionally, another publication is currently under revisions recently also generated new Musashi-2 transgenic mouse model which confirmed Musashi-2 support of lung cancer growth (Bychkov I et al, PMID: 37398283; https://www.biorxiv.org/content/10.1101/2023.06.13.544756v1). Another weakness is that Musashi-2 cannot be effectively targeted and the new genes the authors found that Musashi-2 regulates are likely to be also difficult therapeutic targets. Therefore, impact of this new investigation is relatively modest in the field.

      Major suggestions:

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

    3. Reviewer #2 (Public Review):

      Summary:

      Alison G. Barber et al. reports the function of Msi2 in mouse models of non-small cell lung cancer. The expression of Msi2 in normal lung was evaluated using a knockin reporter allele. Msi2 expressing cells were found to be around 30-40% in normal lung epithelium without a strong bias in subsets of lung cells. Knocking out Msi2 in a KrasG12D and P53 KO model reduced lung cancer initiation. Knocking down Msi2 in established lung cancer cells reduced in vitro sphere formation and in vivo xenograft. Finally, the authors identified several genes whose expression was downregulated by Msi2 knockdown. Knocking down four of these genes, including Ptgds, Arl2bp, hRnf157, and Syt11, each with a single shRNA, reduced lung sphere formation in vitro, suggesting their involvement in lung cancer.

      Strengths:

      This manuscript represents an interesting advance on the role of Msi2 in lung cancer. While some of the data (for example the knockdown effect of Msi2 in established lung cancer cells) corroborated previous findings, the study of Msi2 expression in normal lung and the characterization of the KO phenotype in lung cancer initiation are new and interesting.

      Weaknesses:

      Two areas can be further strengthened. Several conclusions are not fully supported by the existing data. The stable/dynamic nature of Msi2 expressing cells in lung would benefit from more detailed investigations for proper data interpretation.

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Barber and colleagues propose a dual role for the RNA-binding protein Mushashi-2 (Msi2) in lung adenocarcinoma initial transformation and subsequent tumor propagation. First, authors show that Msi2 is expressed in a subset of Club/BASC (37%) and AT2 (26%) cells in the normal lung and displayed a distinct transcriptional profile than non-expressing Msi2 cells. Furthermore, Msi2 is broadly expressed/activated in vivo in genetically induced lung adenocarcinoma tumors (Kras/p53 mouse model) and Msi2+ cells displayed a significantly higher ability to form tumor spheres in vitro. Authors demonstrated by in vivo and in vitro assays that Msi2 loss of function significantly impair tumor growth and progression in lung adenocarcinoma. Data showed that Msi2 function is conserved in human adenocarcinoma tumor growth in patient-derived xenograft. Lastly, novel genes regulated by Msi2 and involved in lung adenocarcinoma tumor growth were identified.

      Strengths:

      The authors provided convincing data for a key role of Msi2 in lung adenocarcinoma tumor progression and growth. Multiple evidences using Msi2 knock-out genetic mouse model and shRNA knock-down in tumor sphere formation assay are clearly demonstrated. The conservation and importance of Msi2 was further shown in human patient-derived xenograft. Although specific cell types (Club/BASC, AT2) were not isolated, authors further delved in the transcriptional difference between Msi2+ and Msi2- cells in the normal lung. Furthermore, novel genes and pathways regulated by Msi2 in lung adenocarcinoma were identified and tested for their ability to inhibit tumor growth in vitro. These 2 RNA-Seq datasets will be useful in the future and provide a basis to explore 1) potential propensity of a given cell to initiate oncogenic transformation, and 2) potential novel regulators of lung adenocarcinoma.

      Weaknesses:

      Although this work strongly demonstrated the importance of Msi2 in lung adenocarcinoma tumor progression and growth, the following points remain to be clarified or addressed.

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

    5. Author response:

      Reviewer #1 (Public Review):

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      In Figure 3, we did not delete Msi2 via shRNA. Instead, we utilized a genetic model in which the Msi2 gene was disrupted via gene trap mutagenesis. We have also used this model in previous publications to define the impact of Msi2 loss in other systems1.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      We demonstrated that the efficiency of Msi2 depletion was ~83% (Figures 4A and 4C) via qPCR analysis for our in vitro and in vivo experiments, respectively, and verified the knockdown via bulk RNA-seq analysis. The shRNA hairpin used was previously validated and published by our lab2.

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      The basis of the cell growth impairment after Msi2 deletion/knockdown in this paper is certainly an important question, and future experiments will be performed to better delineate this. In previous publications loss of Msi2 in leukemia cells has been shown to inhibit growth via arrested cell cycle progression by increasing the expression of p213. Further, loss of Msi2 was also shown to promote apoptosis in part by upregulating Bax3. These data suggest that Msi2 can have an impact via multiple distinct mechanisms including by mediating cell cycle arrest and blocking apoptosis. While these specific genes were not detectably changed after loss of Msi2 in lung cancer cells, other genes in these and other pathways will be important to study in the future.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      In our RNA-seq of cells following Msi2 knockdown, Msi1 expression was undetectable. The TPM values for Msi1 in control and knockdown cells were less than 0.01, suggesting that it did not compensate for the loss of Msi2.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

      Previous published work from the lab showed that expression of Msi2 in the context of myeloid leukemia1can not only repress NUMB protein (I believe protein should be all caps?) (as has been previously demonstrated in the nervous system) but also Numb RNA. This indicated that as an RNA binding protein, Msi2 also can bind and destabilize direct binding targets such as Numb; this was the reason for pursuing transcriptomic analysis.  However as the reviewer suggests, proteomic studies are certainly very important to develop a complete picture of the impact of Musashi to determine which targets are controlled by Msi2 at the protein level.

      Reviewer #2 (Public Review):

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      In previous unpublished work from our lab, we have found that Msi2+ cells from a GFP reporter KPf/fC mouse are readily able to become GFP negative (Msi2-), but the inverse is not true. Specifically, when Msi2+ KPf/fC pancreatic cells were transplanted into the flanks of NSG mice, Msi2+ cells formed tumors in all recipients; these tumors contained both GFP+ and GFP- cells (over 80%)  recapitulating the original heterogeneity and suggesting GFP+ cells can give rise to both GFP+ and GFP- cells (Lytle and Reya, unpublished observations). In contrast only a small subset of GFP- transplanted mice formed tumors. One of the rare GFP- derived tumors was isolated and found to contain largely GFP- cells, with ~0.1% GFP+ cells. The small frequency of GFP expression could be from contaminating cells or may suggest that GFP- cells retain some ability to switch on Msi under selective pressure, and that although they pose a lower risk of driving tumorigenesis than Msi+ cells, they may nonetheless bear latent potential to become higher risk. These data may offer a possible model for projecting the potential of Msi2+ cells in the lung, but is something that needs to be further studied in this tissue.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      In unpublished work from our lab, we have found that Kras mutation upregulates Msi2 over baseline and subsequent p53 loss upregulates Msi2 further in the context of pancreatic cells (Lytle and Reya unpublished results), therefore it is possible that the same is true for the lung. Specifically, we have observed that Msi2 increased from normal acinar cells to Kras-mutated acinar (e.g. pancreatic intraepithelial neoplasia (PanIN)).

      To address whether Msi2+ cells are more susceptible to tumorigenesis, we have recently published data showing that the stabilization of the oncogenic MYC protein in lung Msi2+ cells drive the formation of small-cell lung cancer in a new inducible Msi2-CreERT2; CAG-LSL-MycT58A mice (Msi2-Myc)4 model. More importantly, this data provides the first evidence that normal Msi2+ cells are primed and highly sensitive to MYC-driven transformation across many organs and not just the lung4.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      We isolated the lungs of three Msi2 WT and three Msi2 KO mice and used immunofluorescence staining to stain for CC10 (BASC) and SPC (AT2) to determine if these cell populations were reduced after Msi2 loss alone. Below are representative images showing that the Msi2 KO mice did not have lower numbers of both BASC and AT2 cell populations. 

      Author response image 1.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      The shRNA hairpin used for Msi2 was previously validated and published by our lab2. Additionally, in this work we did develop and use a Msi2 genetic knockout mouse model that validates our shRNA knockdown data showing the specific impact of Msi2 on lung tumor growth.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

      Due to space considerations, we were unable not put the specifics in the legend, but the details are in the methods section (Flank Transplant Assays). In brief, 500,000 cells/well were plated in a 6-well plate coated with Matrigel and 83,000 cells/well were plated in a 24-well plate coated with Matrigel for subsequent determination of transduction efficiency via FACS. 24 hours after transduction, media from the cells was collected and placed on ice. 1mL of 2mg/mL collagenase/dispase was then added to the well and incubated for 45 minutes at 37ºC to dissociate the remaining cells from Matrigel followed by subsequent washes. Cells were pelleted by centrifugation and an equivalent number of shControl and shMsi2 transduced cells were resuspended in full media, mixed at a 1:1 ratio with growth factor reduced Matrigel at a final volume of 100 μL, and transplanted subcutaneously into the flanks of NSG recipient mice.

      Reviewer #3 (Public Review):

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      We performed co-staining of Msi2 and CC10 as well as Msi2 and SPC in Figure 1C. In the future we can include additional markers as well as markers for airway and other alveolar cell types.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      Recently, we published data showing that Msi2 is expressed in Clara cells at the bronchoalveolar junction in the lung of our new Msi2-CreERT2 knock-in mouse model4. Furthermore, stabilization of the oncogenic MYC protein in these specific cells to model Myc amplification was sufficient to drive the formation of small-cell lung cancer4. These data excitingly demonstrate that Msi2+ cells are more responsive to transformation after Myc stabilization.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

      In the future we hope to collect more patient samples to further validate the data presented with the first 2 patients shown here. We are not certain about the reason behind the large impact of Msi2 inhibition, but as cancer stem cells drive the formation of the rest of the tumor and also drive the stromal microenvironment, it is possible that when Msi2 is deleted, Msi2- cells no longer form tumors? and also the ability to build the stromal microenvironment is impacted. This possibility needs to be further tested in future experiments.

      References

      (1) Ito, T. Kwon, H. Y., Zimdahl, B., Congdon, K. L., Blum, J., Lento, W. E., Zhao, C., Lagoo, A., Gerrard, G., Foroni, L., Goldman, J., Goh, H., Kim, S. H., Kim, D. W., Chuah, C., Oehler, V. G., Radich, J. P., Jordan, C. T., & Reya, T. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 466, 765–768 (2010).

      (2) Fox, R. G. Lytle, N. K., Jaquish, D. V., Park, F. D., Ito, T., Bajaj, J., Koechlein, C. S., Zimdahl, B., Yano, M., Kopp, J. L., Kritzik, M., Sicklick, J. K., Sander, M., Grandgenett, P. M., Hollingsworth, M. A., Shibata, S., Pizzo, D., Valasek, M. A., Sasik, R., Scadeng, M., Okano, H., Kim, Y., MacLeod, A. R., Lowy, A. M., & Reya, T. Image-based detection and targeting of therapy resistance in pancreatic adenocarcinoma. Nature 534, 407–411 (2016).

      (3) Zhang, H. Tan, S., Wang, J., Chen, S., Quan, J., Xian, J., Zhang, Ss., He, J., & Zhang, L. Musashi2 modulates K562 leukemic cell proliferation and apoptosis involving the MAPK pathway. Exp Cell Res 320, 119-27 (2014).

      (4) Rajbhandari, N., Hamilton, M., Quintero, C.M., Ferguson, L.P., Fox, R., Schürch, C.M., Wang, J., Nakamura, M., Lytle, N.K., McDermott, M., Diaz, E., Pettit, H., Kritzik, M., Han, H., Cridebring, D., Wen, K.W., Tsai, S., Goggins, M.G., Lowy, A.M., Wechsler-Reya, R.J., Von Hoff, D.D., Newman, A.M., & Reya, T. Single-cell mapping identifies MSI+ cells as a common origin for diverse subtypes of pancreatic cancer. Cancer Cell 41(11):1989-2005.e9 (2023).

    1. eLife assessment

      This fundamental study provides compelling evidence for dysgranular insular involvement in top-down and bottom-up interoceptive processing by building on previous evidence using state-of-the-art methods. Its translational application in ADE patients corroborates the assumption that the mid-insula may indeed be a locus of 'interoceptive disruption' in psychiatric disorders, which underscores the study's high relevance for both body-brain as well as clinical research.

    2. Reviewer #2 (Public Review):

      Summary:

      The authors have conducted an exceptionally informative series of studies investigating the neural basis of interoception in transdiagnostic psychiatric symptoms. By comparing differential and overlapping neural activation during 'top-down' and 'bottom-up' interoceptive tasks, they reveal convergent activation largely localised to the ventral dysgranular subregion ('mid-insula'), which differs in extent between patients and controls, replicating and extending previous suggestions of this region as a central locus of disruption in psychiatric disorders. Their work also reveals different extents of divergent activation in the anterior insula during anticipation of interoceptive disruption. This substantially advances our previous knowledge of the anatomy of interoception, and confirms theoretical predictions of the roles of different cytoarchitectural subregions of the insula in interoceptive dysfunction in mental health conditions.

      Strengths:

      The work is exceptional in terms of breadth and depth, making use of multiple imaging and analysis techniques which are non-standard and go well beyond what is known today. The study is statistically well-powered and the tasks are well-validated in the literature. To my knowledge, these functions of the insula in interoception and mental health have never been compared directly before, so the results are novel and informative for both basic science and psychiatry. The work is strongly theory-driven, building on and directly testing results from influential theories and previous studies. It is likely that the results will strengthen our theoretical models of interoception and advance psychiatric studies of the insula.

      Weaknesses:

      The study has three limitations. (1) The interpretation of the resting-state isoproterenol data could potentially represent fluctuations over time rather than following interoception specifically; future studies should investigate test-retest reliability of this measure. Note this does not preclude the strong conclusions which can be drawn from the authors' task-based data. (2) The transdiagnostic patient sample was almost entirely female, and many were currently taking psychotropic medications; future studies should replicate these effects in unmedicated, sex-balanced samples (3) As the authors point out, there may have been task-specific preprocessing/analysis differences that influenced results, for example due to physiological correction in one but not both tasks; however, there are also merits to this analysis approach, such as comparability with previous studies.

    3. Reviewer #3 (Public Review):

      Summary:<br /> Adamic and colleagues present fMRI data from ADE patients and a healthy control group acquired during two interoceptive tasks (attention and perturbation) from the same session. They report convergent activity within the granular and dysgranular insular cortex during both tasks, with a patient group-specific lateralisation effect. Furthermore, insular functional connectivity was found to be linked to disease severity.

      Strengths:<br /> The study is well-designed and - despite some limitations noted by the authors - provides much-needed insight into the functional pathways of interoceptive processing in health and disease. The manuscript is clear, concise, and well-written.

      Weaknesses:<br /> None remain after the authors' revision.

    4. Reviewer #4 (Public Review):

      Summary:<br /> In the manuscript titled "Hemispheric Divergence of Interoceptive Processing Across Psychiatric Disorders", the authors analyzed a subset of data collected for a larger project investigating interoception in anorexia nervosa and generalized anxiety disorder (ClinicalTrials.gov Identifier: NCT02615119). This study utilized fMRI and various analyses with a special focus on the insula and its connectivity to map the neural commonalities and differences in both top-down and bottom-up interoceptive processing.

      The primary aim was to compare whether these neural activations were quantitatively and qualitatively different in a sample of healthy controls (HC) versus patients diagnosed with anxiety, depression, and/or eating disorders (ADE).

      The study initially recruited 70 patients with primary diagnoses of ADE and 57 HC. After applying exclusion criteria, the final sample consisted of 46 ADE patients and 46 matched HC. Participants underwent task-related and resting-state fMRI scan sessions.

      Specifically, participants performed 2 tasks in fMRI: i) a bottom-up interoceptive (ISO) task involving intravenous infusions of isoproterenol (a peripherally-acting beta-adrenergic receptor agonist) administered in a double-blind, placebo-controlled fashion to alter cardiovascular activity where participants were asked about their visceral awareness; and ii) a top-down interoceptive attention (VIA) task where participants were asked to focus on their visceral sensations triggered by words indicating specific body parts (e.g., STOMACH, HEART, LUNGS) or to pay attention to color changes of the word TARGET during an exteroceptive control task.<br /> Main results show overlapping patterns of neural activation within the dysgranular mid-insula during top-down and bottom-up interoceptive processing with hemispheric differences. The patterns of dysgranular activation distinguished individuals with ADE compared to HC. Also differences in the activation of the anterior agranular insula during periods of interoceptive uncertainty differentiate ADE patients from HC.

      Strengths:<br /> - This is a very nice study that aligns with modern Clinical Neuroscience approaches, as recommended by NIH policy (i.e. RDoC initiative), which puts emphasis describing clinical conditions via transdiagnostic dimensions measured on psychological processes, behaviors, and neural processes rather than merely identifying a series of symptoms.

      I appreciated very much the different analyses that authors performed to characterize differences at the qualitative and quantitative regarding the insular activity and its connectivity during bottom-up and top-down interoceptive processes.

      These findings may open avenues for new studies that will explain the mechanisms underlying these phenomena and provide useful insights for developing novel interventions.

      Weaknesses:<br /> Weakness/Requests of additional clarifications<br /> (1) The sample<br /> (1.1) The authors describe the patient's group as having a primary diagnosis of anxiety, depression, and/or eating disorders. However, Table 1 shows that the majority had Anxiety disorders, some Major Depression (it is not clear which are the percentages of patients that at the time of the study had a concurred problem of major depression, please clarify), and very few had a diagnosis of Anorexia Nervosa. The leftward activation asymmetry and distinct activation patterns in the left dysgranular mid-insula across both the ISO and VIA tasks found on ADE did not correlate with symptoms measured by the SCOFF questionnaire, but correlated with anxiety and depressive symptoms. It would be nice if the authors can comment on these results in relation to eating disorders.

      (1.2) Furthermore, the sample consisted of 5 males and 41 females in the HC group and 1 male and 45 females in the ADE group. In order to generalize these findings, the authors should acknowledge this gender imbalance and discuss whether they expect similar results in a predominantly male sample.

      (2) The procedure<br /> While the fixed order of tasks reflects the primary emphasis on acquiring data from the infusion (ISO) task, this could introduce confounding order effects. The authors should acknowledge this as a limitation of this study.

      (3) The rationale behind the study<br /> - The authors recognized that there was a broader aim behind this data collection. It would be important to clarify a little bit more how the differences in insular areas mapping both (or specifically) bottom-up and top-down interoceptive processes and insular connectivity, recorded in ADE patients compared to healthy controls (HC), contribute to psychiatric diagnoses (hypothesis 3).<br /> For example, they should explain the psychopathological dimensions common to the three patient groups. Are disturbances in bottom-up and top-down interoceptive processing common traits in these patients, reflected in the asymmetric interhemispheric dysgranular mid-insular activation? The link between these disturbances and anatomical evidence of convergence/divergence of top-down vs. bottom-up interoceptive processes should be clearly stated.

      (4) Operationalization of Convergence / Divergence maps underlying top-down and bottom-up interoceptive processes in HC vs ADE patients<br /> It is not clear to me the concept of Convergence / Divergence maps underlying top-down and bottom-up interoceptive processes. The authors want to compare, in HCs and ADE patients, the neural structures that are co-activated (convergence maps) vs those that are uniquely involved (divergence maps) in top-down and bottom-up interoceptive processes in the two groups. Thus, I would expect that these two different analyses would have been performed on similar portions of data, instead different moments of the tasks (= different bottom-up / top-down interoceptive processes) have been analyzed.<br /> Specifically, the convergence maps have been identified by comparing active voxels recorded when participants were focusing on the heart and the lungs (compared to when they were focused on the exteroceptive features of the target) in the VIA task, and during infusions (Peak period) of 2mcg isoproterenol (compared to baseline) in the ISO task. The divergence maps have been identified by comparing voxels uniquely active during the anticipatory phases of both isoproterenol and saline infusions (compared to baseline) and during the peak period of saline dose of the ISO task with respect to when participants focused their attention on the heart and the lungs (compared to when they were focuses on the exteroceptive features) in the VIA task.<br /> I understand the idea of mapping interoceptive uncertainty, however I think that these two analyses do not show commonalities and differences in the neural structures involved in bottom up vs top down processes (in ADE vs HC), but also neural correlates underlying different types of interoceptive processes involving or nor top-down expectations.<br /> According to the authors, which is the most important neural marker that differentiates the ADE group: the difference in hemispheric activations within the left and right dysgranular insula or the less granular anterior insular activation during periods of interoceptive uncertainty? Also, do they reflect different transdiagnostic dimensions?

      (5) Collected physiological measures<br /> The authors speak about cardiorespiratory interoceptive processes, but they only included cardiac measures. Including respiratory changes could provide a more comprehensive comparison between bottom-up signals and top-down attentional processes. Also, I guess that the "STOMACH" trials of the VIA task were not analyzed in this study since those are used in the bigger study and since no gastric measures were collected? Please clarify this point.

      (6) ISO task instructions<br /> To better understand the task and participants' expectations, could the authors clarify the instructions given to participants regarding the isoproterenol and saline infusions. Did the participants have two types of expectations?

      (7) Title of the study<br /> I understand that the term "divergence" in the title refers to the different hemispheric activations characterizing ADE patients compared to HC. However, it also suggests an analysis based on convergence/divergence maps, which might be ambiguous. Could the authors make some small modifications to the title to make it clearer?

      (8) Caption of Figure 7<br /> The caption of Fig.7 notes that no difference in HR was found during the Saline infusion between the HC and ADE groups. However, it would be fair to mention the significant difference in dial ratings observed during the Saline infusion. How do the authors explain this difference?

      Typos<br /> Figure 3 In Figure 3, "Hemispheric divergence", I think, should be corrected to "Hemispheric convergence."

      I believe that by addressing these points, the manuscript will provide a clearer and more comprehensive understanding of the rationale, methods, and findings underlying this study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      One concern is regarding the experimental task design. Currently, only subjective reports of interoceptive intensity are taken into account, the addition of objective behavioural measures would have given additional value to the study and its impact. 

      To address this comment, we calculated interoceptive accuracy during the cardiorespiratory perturbation (isoproterenol) task according to our previous methods (e.g., Khalsa et al 2009 Int J Psychophys, Khalsa et al, 2015 IJED, Khalsa et al 2020 Psychophys, Hassanpour et al, 2018 NPP, Teed et al 2022 JAMA Psych). Thus, we quantified interoceptive accuracy as the cross-correlation between heart rate and real-time cardiorespiratory perception; specifically, the zero-lag cross-correlation between the heart rate and dial rating time series, and the maximum cross-correlation between these time series while allowing for different temporal delays (or lags). As expected, we found a dose-related increase in interoceptive accuracy from the 0.5mcg moderate perturbation dose (for which neuroimaging maps were not included in the current study) to the 2.0mcg high perturbation dose: zero-lag cross-correlations of 0.25 and 0.61, maximum cross-correlations of 0.41 and 0.73, for 0.5mcg and 2.0mcg doses, respectively, when averaged across all participants in the current study. Taking a closer examination at just the 2.0mcg dose, there were no group differences in zero-lag cross-correlation (t89\=-0.68, p=0.50) or maximum cross-correlation (t87\=-1.0, p=0.32) (depicted below, panel A). Furthermore, there were no associations between either of these interoceptive accuracy measures and the magnitude of activation within bilateral dysgranular convergent regions (F1\= 0.27 and 0.01, p=0.61 and 0.91, for the main effect of percent signal change on max and zero-lag cross-correlations, respectively; depicted below, panel B). When considering the significant correlation between the right insula signal intensity and subjective dial ratings, this lack of association with interoceptive accuracy suggests that the right dysgranular convergent insula was preferentially tracking the magnitude estimation rather than accuracy facet of interoceptive awareness during cardiorespiratory perturbation. Notably, during the saline placebo infusion, there were no systematic changes in heart rate and thus no systematic change in dial rating, precluding the calculation of the cross-correlation as a measure of interoceptive accuracy.

      In reviewing these findings, we did not feel that the results add meaningful information to our interpretation of convergence, and accordingly we have chosen not to include it in the manuscript.

      Author response image 1.

      (A) Interoceptive accuracy during 2.0mcg isoproterenol perturbation, as measured by the maximum (left panel) and zero-lag (right panel) cross-correlation between the time series of heart rate and perceptual dial rating. There were no differences between groups. (B) There were no associations between interoceptive accuracy ratings and signal intensity within the convergence dysgranular insula during the Peak period of 2.0mcg perturbation. 

      This brings me to my second concern. The authors mostly refer to their own previous work, without highlighting other methods used in the field. Some tasks measure interoceptive accuracy or other behavioural outcomes, instead of merely subjective intensity. Expanding the scientific context would aid the understanding and integration of this study with the rest of the field. 

      Given our focus on the neural basis of bottom-up perturbations of interoception, we found it relevant to reference previous studies from our lab, as we built directly upon these previous findings to inform the hypotheses and design of the current experiment, but we can appreciate to provide a broader view of the literature. To expand the contextual frame, we have cited two fMRI meta-analyses of cardiac and gastrointestinal interoception (line 101). There are few studies that have used comparable perturbation approaches during neuroimaging in clinical populations, although we have referenced an exemplar study from the respiratory domain by Harrison et al (2021) in the discussion (line 612). In considering this comment more carefully, we felt that expanding the context further to other task-based methods or behavioral outcomes would shift the focus beyond our emphasis on the insular cortex and top-down/bottom-up convergence, though we have previously discussed and integrated such approaches (e.g., Khalsa & Lapidus, 2016 Front Psych, Khalsa et al, 2018 Biol Psychiatry CNNI, Khalsa et al 2022, Curr Psych Rep).

      Lastly, the suggestions for future research lack substance compared to the richness of the discussion. I recommend a slight revision of the introduction/discussion. There is text in the discussion (explanatory or illuminating) which is better suited to the introduction. 

      When discussing our study limitations (beginning line 732), we offer numerous areas for future research including different preprocessing pipelines, more sophisticated analysis techniques (such as multivariate pattern analysis) that would allow for individual-level inferences regarding convergent patterns of activation within the insula. However, we have revised the last sentence of our limitations paragraph (line 757), and have added more specificity regarding future approaches examining insular and whole-brain interoceptive signal flow.

      Reviewer 2:

      (1) The interpretation of the resting-state data is not quite as clear-cut as the task-based data - as presented currently, changes could potentially represent fluctuations over time rather than following interoception specifically. In contrast, much stronger conclusions can be drawn from the authors' task-based data. …I was also unsure about the interpretation of the resting state analysis (Figure 5), as there was no control condition without interoceptive tasks, meaning any change could represent a change over time that differed between groups and not necessarily a change from pre- to post-interoception. Relatedly I wondered if the authors had calculated the test-retest reliability of the resting state data (e.g. intraclass correlation coefficients for the whole-brain functional connective of convergent dysgranular insula subregions and left middle frontal gyrus before vs. after the tasks), as it would be generally useful for the field to know its stability. 

      We have acknowledged the lack of a control condition in the isoproterenol task (note that the VIA task contained an exteroceptive trial that was included in the brain image contrast analysis). We have also provided further justification for our approach in both the Methods (see the first paragraph “fMRI resting state analysis” subsection) and Results (see the last paragraph of the “Convergence analysis” subsection). We cannot estimate test-retest reliability from the current dataset, given that we do not have resting state scans separated by a similar time frame without the performance of the interoceptive tasks in between (this is now clarified in line 346).

      (2) The transdiagnostic sample could be better characterised in terms of diagnostic information, and was almost entirely female; it is also unclear what the effect of psychotropic medications may have been on the results given the effects of (e.g.) serotonergic medication on the BOLD signal. …Table 1 would be substantially improved by a fuller clinical characterisation of the specific sample included in the analysis - the diagnostic acronyms included in the table caption are not used in the table itself at present and would be an excellent addition, describing, for example, the demographics and symptom scores of patients meeting criteria for MDD, GAD, and AN (and perhaps those meeting criteria for more than 1). Similarly, additional information about the specific medications patients (or controls?) were taking in this study would be welcome (given the potential influences of common medications (e.g. antidepressants) on neurovascular coupling). 

      We have expanded Table 1 to include more specific diagnostic information for the transdiagnostic ADE group (GAD, MDD, and/or AN, as well as other psychiatric diagnoses). We have also included medication use.  

      Finally, Figures 7c and 7d would be greatly improved by showing individual data points if possible, and there may be a typo in the caption 'The cardiac group reported higher cardiac intensity ratings in the ADE group'.

      We have adjusted Figure 7c and 7d to include individual data points, as we agree that this provides greater transparency to the data itself. We have also fixed the typo in the figure caption.

      (3) As the authors point out, there may have been task-specific preprocessing/analysis differences that influenced results, for example, due to physiological correction in one but not both tasks. Although I note this is mentioned in the limitations, it was not clear to me why physiological noise was removed from the ISO task and whether it would be possible to do the same in the VIA task, which could be important for the most robust comparison of the two. 

      In this study, we intentionally chose different task-specific preprocessing pipelines so we could ensure that our results were not simply due to new ways of handling the data. This would allow us to evaluate evidence of replicating the previous group-level findings of insular activation that informed the current approach and hypotheses. We agree that a harmonized approach is also merited, and in a subsequent project using this dataset, we have matched preprocessing pipelines for a connectivity-based analysis, to best facilitate comparison across tasks. We look forward to sharing those results with the scientific community in due time.

      Reviewer 3:

      Maybe I missed it (and my apologies in case I did), but there were a few instances where it was not entirely clear whether differential effects (say between groups or conditions) were compared directly, as would be required. One example is l. 459 ff: The authors report the interesting lateralisation effect for the two interception tasks and say it was absent in the exteroceptive VIA task. As a reader, it would be great to know whether that finding (effect in one condition but not in the other) is meaningful, i.e. whether the direct comparison becomes statistically significant. … The same applies to later comparisons, for example, the correlations reported in l. 465 ff (do these differ from one another?) as well as the FC patterns reported in l. 476 ff - again, there is a specific increase in the ADE group (but not in the HC), but is this between-group difference statistically meaningful? 

      Thank you for these questions. We have added greater detail in the Results section in order to increase clarity regarding which statistical comparisons support which conclusions. Generally, we limited our comparisons to the effect of group, as comparing ADE vs. HC individuals was of primary interest, and in some cases also the effect of hemisphere and epoch. However, we did not perform exhaustive comparisons for all measures, in the interest of keeping the focus of our multi-level multi-task analysis on the hypothesis-driven questions specifically related to convergence of top-down and bottom-up processing.

      Regarding the comment asking if we could compare the lateralization effect directly across task conditions (i.e., is there a greater difference between hemispheres in the ISO task compared to VIA?): unfortunately, directly comparing signal intensity across tasks is not possible because the isoproterenol infusion induces physiological changes that can cause some dose-related signal reduction (we have attempted to address this in the past, e.g., Hassanpour et al, 2018 HumBrMapp). Consequently, our conclusions about spatial localization of top-down and bottom-up convergence are limited to group-level comparisons based on binary activation.

      (2) A second 'major' relates to the intensity ratings (l. 530 ff). I found it very interesting that the ADE group reported higher cardiac, but lower exteroceptive intensity ratings during the VIA task. I understand the authors' approach to collapse within the ADE group, but it would be great to know which subgroup of patients drives this differential effect. It could be the case that the cardiac effect is predominantly present in the anxiety group, while the lower exteroceptive ratings are driven by the depression patients. Even if that were not the case, it would be highly instructive to understand the rating pattern within the anxiety group in greater detail. Do these patients 'just' selectively upregulate interoception, or is there even a perceived downregulation of exteroceptive signalling? 

      We have depicted these data below for reviewers’ reference, showing individual responses for each group (HC and ADE; panel A), as well as the ADE individuals separated by primary diagnosis (GAD = generalized anxiety disorder, n=24; AN = anorexia nervosa, n=16; MDD = major depressive disorder, n=6; panel B). When tested via linear regression, we found no differences in ratings across ADE subgroups (rating ~ subgroup * condition, F3\=1.71, p=0.16 for main effect of subgroup). However, several factors should be considered in interpreting this result: first, all subgroups are small, particularly the MDD sample. Second, while these diagnostic labels refer to the most prominent symptom expression of each patient, every clinical participant in the study had a co-morbid disorder. Therefore, it is not possible to isolate disorder-specific pathology from our multi-diagnostic sample, and for this reason we refrained from including the subgroup-specific data in the manuscript.

      Author response image 2.

      (A) Post-trial ratings during the Visceral Interoceptive attention task, for reference. This is also shown in Figure 7D. (B) The same post-trial ratings in (A), but with the ADE group separated by primary diagnoses. Importantly, although assigned to one diagnostic category on the basis of most prominent symptom expression, most patients had one or more comorbidities across disorders. GAD = Generalized Anxiety Disorder. MDD = major depressive disorder. AN = anorexia nervosa. HC = healthy comparison.

      l. 86: 'Conscious experience' of what, precisely? During the first round of reading, I was wondering about the extent to which consciousness as a general concept will play a role, which could be misleading. 

      We have changed it to “conscious experience of the inner body” in the text. The current study is limited in scope to the neurobiology of conscious perceptions of the inner body, not consciousness as a general phenomenon. We hope this distinction is now clear.

      l.115: Particularly given the focus on predictive processing, I was wondering whether the (slightly outdated) spotlight metaphor is really needed here. 

      While not perfect, we believe it is still valid to metaphorically reference goal-directed attention towards the body as an “attentional spotlight”. Given the concern, we have minimized the focus on this metaphor, and the sentence now reads as follows:

      “Extending beyond these model-based influences are goal-directed activities (also described previously as the ‘attentional spotlight’ effect ((Brefczynski and DeYoe 1999)), whereby focusing voluntary attention towards certain environmental signals not only alters their conscious experience but selectively enhances neural activity in the responsive area of cortex.”

      l. 129 ff: The sentence has three instances of 'and' in it, most likely a typo. 

      We have fixed this in the text.

      l. 245: What do these ratings correspond to, i.e. what was the precise question/instruction? 

      The instructions for subjective ratings in each task are mentioned in the Methods (line 223 for ISO task, line 249 for the VIA task), and we have added more detail regarding the scale used to collect subjective intensity ratings.

      l. 322: Could you provide the equation of the LMEM in the main text? It would be interesting to know e.g. whether participants/patients were included as a random effect. 

      We have provided this equation in the Methods (line 326).

      l. 418 ff: I was confused about the statistical approach here. Why use separate t-tests instead of e.g. another LMEM which would adequately model task and condition factors? 

      We did not use t-tests, but instead used linear regression to look at differences in agranular PSC across groups, hemispheres, and epochs, as well as potential associations between PSC and trait measures. We have adjusted the wording in this Methods paragraph (line 418) to help clarity.

      l. 425: As a general comment, it would be great to provide the underlying scripts openly through GitHub, OSF, ... 

      We agree with this comment, and our main analysis scripts have been posted on our OSF as an addition to the original preregistration of this work (https://osf.io/6nxa3/).

      l. 443: For consistency, please report the degrees of freedom for the X² test.

      l. 454: ... and the F statistic would require two degrees of freedom (only the second is reported).

      l. 523: The t value is reported without degrees of freedom here (but has them in other instances).

      l. 540: Typo ('were showed').

      We have reported degrees of freedom for all statistics.

    1. eLife assessment

      This is a potentially valuable contribution, reporting a deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila. However, the framework that the MSL complex mediates dosage compensation is outdated and has flaws, and the evidence is currently considered inadequate to support the claims. Because there are many ways to alter viability, sex-specific viability is insufficient to make claims regarding dosage compensation.

    2. Reviewer #2 (Public Review):

      Summary:

      A deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila was performed.

      Strengths:

      The deletion analysis of the MSL1 protein and the tests of interaction with MSL2 are adequate.

      Weaknesses:

      This reviewer does not adhere to the basic premise of the authors that the MSL complex is the primary mediator of dosage compensation of the X chromosome of Drosophila. Several lines of evidence from various laboratories indicate that it is involved in sequestering the MOF histone acetyltransferase to the X chromosome but there is a constraint on its action there. When the MSL complex is disrupted, there is no overall loss of compensation but there is an increase in autosomal expression. Sun et al (2013, PNAS 110: E808-817) showed that ectopic expression of MSL2 does not increase expression of the X and indeed inhibits the effect of acetylation of H4Lys16 on gene expression. Aleman et al (2021, Cell Reports 35: 109236) showed that dosage compensation of the X chromosome can be robust in the absence of the MSL complex. Together, these results indicate that the MSL complex is not the primary mediator of X chromosome dosage compensation. The authors state that an inverse dosage effect results from a titration of the histone acetylase MOF between the NSL and MSL complexes. This is a misunderstanding of the inverse effect, which is an imbalance of regulatory molecules as described in the citation below. The inverse effect operates in triple X metafemales to produce dosage compensation of the three X chromosomes and a reduced expression of the autosomes (Sun et al 2913 PNAS 110: 7383-7388). There is no MSL complex in metafemales.

      A detailed explanation was provided by Birchler and Veitia (2021, One Hundred Years of Gene Balance: How stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenetics and Genome Research 161: 529-550). The relevant portions of that article that pertain to Drosophila are quoted below. The cited references can be found in that publication.

      "In Drosophila, the sex chromosomes consist of an X and a Y. The Y in this species contains only a few genes required for male fertility (Zhang et al., 2020). The X consists of approximately 20% of the genome. Thus, females have two X chromosomes and males have one. Muller (1932) found that the expression of genes between the two sexes was similar but when individual genes on the X were varied in dosage they exhibited a proportional dosage effect. Each copy in a male was expressed at about twice the level as each copy in a female. Females with three X chromosomes are highly inviable but when they do survive to the adult stage, Stern (1960) found that they too exhibited dosage compensation in that the expression in the triple X genotype was similar to normal females and males. Studies in triploid flies found that dosage compensation also occurred among X; AAA, XX;AAA, and XXX; AAA genotypes via upregulation of the Xs, where X indicates the dosage of the X and A indicates the triploid nature of the autosomes (see Birchler, 2016 for further discussion). Diploid and triploid females have a similar per gene expression but the other five genotypes each must modulate gene expression by different amounts equivalent to an inverse relationship between the X versus autosomal dosage to achieve a balanced expression between the X and the A (Birchler, 1996).

      Some years ago, mutations were sought in Drosophila that were lethal to males but viable in females. A number of such mutations were found and termed Male Specific Lethal (MSL) loci (Belote and Lucchesi, 1980). Once the products of these genes were identified, they were found to be at high concentrations on the male X chromosome (Kuroda et al., 1991). One of these genes encodes a histone acetyl transferase that acetylates Lysine16 of Histone H4 (Bone et al., 1994; Hilfiker et al., 1997). The recognition of the MSL complex and its association with the male X was an important set of contributions to an understanding of sex chromosome evolution in Drosophila (Kuroda et al., 2016). Thus, the hypothesis arose that the MSL complex accumulated this chromatin modifier on the male X to activate the expression about two-fold to bring about dosage compensation. Other data that contributed to this hypothesis were that when autoradiography of nascent transcription on salivary gland polytene chromosomes was examined in the MSL maleless mutation, the ratio of the number of grains over the X versus an autosomal region was reduced compared to the normal ratio (Belote and Lucchesi, 1980).

      It has been pointed out (Hiebert and Birchler, 1994; Bhadra et al., 1999; Pal Bhadra et al., 2005; Sun et al., 2013a; Birchler, 2016), however, that the grain counts over the X and the autosomes when considered in absolute terms rather than as a ratio show that the X more or less retained dosage compensation and the autosomal numbers are about doubled, i.e. exhibit an inverse dosage effect. The same situation occurs with the msl3 mutation (Okuno et al., 1984), another MSL gene, in that the autoradiographic grain numbers as an absolute measure show retention of X dosage compensation and an autosomal increase. The data treatment to produce an X to A ratio seemed reasonable in the context of the time when all regulation in eukaryotes was considered positive. However, when studies were conducted in such a manner as to assay the absolute effect on gene expression in the maleless mutation, in adults (Hiebert and Birchler, 1994), larvae (Hiebert and Birchler, 1994; Bhadra et al., 1999; 2000; Pal Bhadra et al., 2005), and embryos (Pal Bhadra et al., 2005), the trend was for retention of dosage compensation of X linked genes and an increase in expression of autosomal genes.

      In global studies, if the X to autosomal expression does not change between mutant and normal, one can conclude that dosage compensation is operating. However, a lower X to A ratio could be a loss of compensation or an increased transcriptome size from the increase of the autosomes, as suggested by the absolute data of Belote and Lucchesi (1980) and Okuno et al (1984) and that was visualized directly in embryos (Pal Bhadra et al., 2005). The transcriptome size in aneuploids can change, which cannot be detected in RNA-seq analyses alone (Yang et al., 2021), so it is an important consideration for studies of dosage compensation. It was recently acknowledged that in MSL2 knockdowns the relative X expression is decreased and a moderate autosomal increase is found (Valsecchi et al., 2021b). A similar trend is evident in the microarray data on MSL2 knockdown in SL2 tissue culture cells (Hamada et al., 2005) and in the roX RNA (noncoding RNAs essential for MSL localization on the male X) mutants (Deng and Meller, 2006). This trend is in fact consistent with the absolute data that suggest an increase in the transcriptome size (Figure 7). A global change in transcriptome size can cause a generalized dosage compensation of a single chromosome to appear as a proportional dosage effect (loss of compensation) to some degree (Figure 7).<br /> Examination of expression in triple X metafemales, where there is no MSL complex, found that X-linked genes generally show dosage compensation but there is a generalized inverse effect on the autosomes, which could account for the detrimental effects of metafemales (Birchler et al., 1989; Sun et al., 2013b). An examination in metafemales of alleles of the white eye color gene that do or do not exhibit dosage compensation in males, showed the same response, namely, increased expression if there was no dosage compensation in males and no difference from normal females for the male dosage-compensated alleles (Birchler, 1992). This experiment demonstrated a relationship between the mechanism of dosage compensation in males and metafemales and implicated the inverse dosage effect in both. An involvement of the inverse effect in Drosophila dosage compensation provides an explanation for how the five levels of gene expression can be explained (Birchler, 1996), whereas an all-or-none presence of a complex on the X does not. The stoichiometric relationship of regulatory gene products provides a means to read the relative dosage at multiple doses to produce the appropriate inverse level.

      What then is the function of the MSL complex? It was discovered that the MSL complex will actually constrain the effect of H4 lysine16 acetylation to prevent it from causing an overexpression of genes (Bhadra et al., 1999; 2000; Pal Bhadra et al., 2005; Sun and Birchler 2009; Sun et al., 2013a). Indeed, in the chromatin remodeling Imitation Switch (ISWI) mutants, the male X chromosome was specifically overexpressed suggesting that its normal function is needed for the constraint to occur (Pal Bhadra et al., 2005). Independently, the Mtor nuclear pore component shows a similar specific male X upregulation when Mtor is knocked down and this effect was shown to operate on the transcriptional level (Aleman et al., 2021). Interestingly, the increased expression of the X in the Mtor knockdown is accompanied by an inverse modulation of a substantial subset of autosomal genes, illustrating why the constraining process evolved to counteract male X overexpression. The constraining effect might involve a number of gene products (Birchler, 2016) and is an interesting direction for further study.

      Furthermore, when the H4Lys16 acetylase was individually targeted to reporter genes, there was an increase in expression (Sun et al., 2013a). However, when other members of the MSL complex were present in normal males or ectopically expressed, this increase did not occur (Sun et al., 2013a). It thus appears that the function of the MSL complex is to sequester the acetylase from the autosomes and constrain it on the X (Bhadra et al., 1999; 2000; Pal Bhadra et al., 2005; Sun and Birchler, 2009; Sun et al., 2013a). Indeed, in the Mtor knockdowns, the X linked genes with the greatest upregulation were those with the greatest association with the acetylase and the H4K16ac histone mark (Aleman et al 2021), supporting the idea of a constraining activity that becomes released in the Mtor knockdown. When the MSL complex is disrupted, there is an inverse effect on the autosomes that occurs but in normal circumstances the sequestration mutes this effect. The MSL complex disruption releases the acetylase to be uniformly distributed across all chromosomes as determined cytologically (Bhadra et al., 1999) or via ChIPseq for H4Lys16ac (Valsecchi et al., 2021a). Indeed, the quantity of the H4Lys16ac mark only has a proportional effect on gene expression when the constraining activity is disrupted (Aleman et al., 2021) or when the MSL complex is not present (Sun et al., 2013a). Thus, in normal flies there is a more or less equalized expression of the X and autosomes despite the monosomy for 20% of the genome.

      The component of the complex that is expressed in males and thought to organize the complex to the male X, MSL2, was recently found to also be associated with autosomal dosage sensitive regulatory genes (Valsecchi et al., 2018). MSL2 was found to modulate these autosomal dosage sensitive genes in various directions, which illustrates that MSL2 has a role in dosage balance that goes beyond the X chromosome. This finding is consistent with the evolutionary scenario that the initial attraction of the complex to the X chromosome was to upregulate dosage sensitive genes in hemizygous regions as the progenitor Y became deleted for them, with the constraining activity evolving to prevent an overexpression as the amount of acetylase on the male X increased with time (Birchler, 2016).

      The MSL hypothesis takes an X-centric view that does not accommodate what is now known about dosage effects across the whole genome. The idea that dissolution of the MSL complex would cause reduction in expression of the male X linked genes without any consequences for the autosomes is not consistent with current knowledge of gene regulatory networks and their dosage sensitivity. Indeed, the finding of dosage compensation in large autosomal aneuploids that operates on the transcriptional level (Devlin et al., 1982; 1984; Birchler et al., 1990; Sun et al., 2013c) as well as a predominant inverse effect by the same (Devlin, et al., 1988; Birchler et al., 1990) argues that one must consider the inverse effect for an understanding of the evolution of dosage compensation in Drosophila (and other species). Further discussion of models of Drosophila compensation has been published (Birchler, 2016).

      What is likely to be the most critical issue with sex chromosome evolution is the consequences for dosage sensitive regulatory genes. This fact is nicely illustrated by the retention of these types of genes in different independent vertebrate sex chromosome evolutions (Bellott and Page, 2021). In Drosophila, by contrast, dosage compensation is more of a blanket effect on most but not all X linked genes despite the fact that many genes on the X are unlikely to have dosage detrimental effects, although dosage sensitive genes might have played a role as noted above. The particularly large size of the X in Drosophila compared to the whole genome is potentially a contributing factor because such large genomic imbalance is likely to modulate most genes across the genome. Also, there is no evidence of a WGD in Drosophila as there is in other species for which the inverse effect has been documented (maize, Arabidopsis, yeast, mice, human). These other species have various numbers of retained duplicate dosage sensitive regulatory genes from WGDs. Thus, the relative change of regulatory genes in aneuploids in these species will not be as great compared to some of their interactors in the remainder of the genome, which could result in lesser magnitudes of some trans-acting effects, similarly to how aneuploids in ascending ploidies have fewer effects as described above. The absence of duplicate regulatory genes in Drosophila would predict a stronger inverse effect in general and that could have been capitalized upon to produce dosage compensation of most genes on the X chromosome despite many of them not being dosage critical. While sex chromosome evolution must accommodate dosage sensitive genes for proper development and viability, it could also be capitalized upon to evolve sexual dimorphisms in expression (Sun et al., 2013c)."

      Comments on revised submission:

      The authors did make an effort to address the issue previously raised.

      The authors state that an inverse dosage effect results from a titration of the histone acetylase MOF between the NSL and MSL complexes (lines 87-89). This is a misunderstanding of the inverse effect, which is an imbalance of regulatory molecules. Single regulatory gene dosage series can produce this effect. The inverse effect operates in triple X metafemales to produce dosage compensation of the three X chromosomes and a reduced expression of the autosomes (Sun et al 2913 PNAS 110: 7383-7388). There is no MSL complex in metafemales.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for taking the time to review our manuscript. We are grateful to reviewer #1 for positive evaluation of our work and for providing valuable comments that will significantly enhance the presentation of our results. We understand reviewer #2's negative assessment because we did not discuss an alternative model of dosage compensation in Drosophila. We will address this omission in the Introduction section of the revised manuscript and remove any controversial statements from other parts of the text. However, it is important to clarify that our study does not focus on the mechanisms of dosage compensation. The main goal of the manuscript was to investigate the assembly of the MSL complex and its specific binding to the Drosophila X chromosome. We utilized male survival data to demonstrate the efficacy of MSL complex binding to the X chromosome, a relationship that has been supported by numerous independent studies. We understand that Reviewer #2 agrees that disruption of the MSL complex binding results in male lethality. As far as we understand, Reviewer #2 suggests that the MSL complex does not activate transcription of X chromosome genes, but instead facilitate the recruitment of MOF protein and potentially other general transcription factors to the X chromosome. This could explain the decrease in autosomal gene expression due to a reduction in activating factors like MOF at autosomal promoters. In the upcoming revision, we aim to strike a balance between the two models that elucidate dosage compensation in Drosophila. We appreciate your feedback and look forward to enhancing the clarity and coherence of our manuscript based on your insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      A deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila was performed.

      Strengths:

      The deletion analysis of the MSL1 protein and the tests of interaction with MSL2 are adequate.

      We thank the reviewer for the positive assessment of the experimental work done.

      This reviewer does not adhere to the basic premise of the authors that the MSL complex is the primary mediator of dosage compensation of the X chromosome of Drosophila.

      We completely agree with this reviewer's claim. In the Introduction section we attempted to make clear that there are two models for the functional role of specific recruitment of the MSL complex to the X chromosome in males.

      Several lines of evidence from various laboratories indicate that it is involved in sequestering the MOF histone acetyltransferase to the X chromosome but there is a constraint on its action there. When the MSL complex is disrupted, there is no overall loss of compensation but there is an increase in autosomal expression. Sun et al (2013, PNAS 110: E808-817) showed that ectopic expression of MSL2 does not increase expression of the X and indeed inhibits the effect of acetylation of H4Lys16 on gene expression. Aleman et al (2021, Cell Reports 35: 109236) showed that dosage compensation of the X chromosome can be robust in the absence of the MSL complex. Together, these results indicate that the MSL complex is not the primary mediator of X chromosome dosage compensation. The authors use sex-specific lethality as a measure of disruption of dosage compensation, but other modulations of gene expression are the likely cause of these viability effects.

      Sun et al (2013, PNAS 110: E808-817) showed that recruitment of the MSL complex-specific subunit MSL2 or the MOF protein to the UAS promoter resulted in recruitment of the entire MSL complex in males but not transcriptional activation. This important result argues that the MSL complex does not activate transcription. However, it must be taken into account that the GAL4 DNA binding region used to recruit the chimeric MSL2 protein to the UAS promoter was directly fused to the MSL2 RING domain, which is critical for interaction of MSL2 with MSL1 and its ubiquitination activity (this activity could potentially be involved in transcription activation). It also remains poorly understood what happens to the MSL complex after recruitment to the promoters or HAS on the X chromosome. Subcomplex MSL1/MSL3/MOF can acetylate TF and H4K16 during RNA polymerase II elongation, resulting in increasing of transcription. The separate role of MSL2 and MSL1 in the activation of transcription of gene promoters is also shown. Sun et al. showed that in females, recruitment of MOF to the UAS promoter leads to a strong increase in transcription, which is associated with the inclusion of MOF in the non-specific lethal (NSL) complex, which is bound to promoters and is required for strong transcription activation. In males, MOF is preferentially recruited to the UAS promoter in the full MSL complex or perhaps in the MSL1/MSL3/MOF subcomplex, which stimulates transcription during RNA polymerase II elongation much less strongly than NSL complex. The same result was obtained in the Prestel et al. 2010 (Mol Cell 38:815-26). In this study the GAL4 binding sites were inserted upstream of the lacZ and mini-white genes. Activation of transcription after recruitment of GAL4-MOF to the GAL4 sites was studied in males and females. As in Sun et al. 2013, strong activation of the reporter was observed in females. A weak transcriptional activation of the reporter gene in males was shown, and the MOF protein was detected not only on the promoter, but also in the coding and 3’ regions of the reporter.

      We do not understand how the paper by Aleman et al (Cell Reports 35: 109236, 2021) is consistent with the hypothesis that the MSL complex is not involved in the transcriptional activation of X chromosomal genes. The main conclusions of this paper: 1) Inactivation of Mtor leads to selective activation of the male X chromosome. 2) Mtor-driven attenuation of male X occurs in broad domains linked by the MSL complex. 3) Mtor genetically interacts with MSL components and reduces male mortality; 4) Mtor restrains dose-compensated expression at the level of nascent transcription. Thus, the paper shows that the MSL complex has an activator activity that is partially inhibited by Mtor. Accordingly, inactivation of Mtor only partially restored the survival of males in which dosage compensation was not completely inactivated.

      A detailed explanation was provided by Birchler and Veitia (2021, One Hundred Years of Gene Balance: How stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenetics and Genome Research 161: 529-550).

      We agree that an alternative model of the dosage compensation mechanism is reasonable. We can assume that both mechanisms can function jointly provide effective dosage compensation in Drosophila males. At the suggestion of the reviewer to reconsider the entire context of the article, we will make many small changes throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Overall, I found the text well written and the figures logically organized (especially Figure 5, which had the potential to confuse). The authors especially excelled in bringing together the decades of literature in the Discussion.

      I offer several suggestions to improve the readability:

      Consider presenting the coiled-coil domain homology in Figure 1A as a contrast for the N-terminal region, which the authors claim is poorly conserved.

      We added the coiled-coil domain homology in Figure 1A in new version of the manuscript.

      It is difficult to visualize the red MSL2 in Figure 2; the green and red panels should be presented separately in the main text, as they are in the Supplemental Figure 2.

      We prepared Figure 2 with separate green and red panels.

      The ChIP-seq experiments for MSL proteins are well presented, but in my opinion, add little to the overall conclusions:

      Figure 6 mostly recapitulates what has already been published and utilized by several groups, most recently the authors themselves (Tikhonova 2019): that MSL expressed in females targets the X/HAS, similar to in males. While these are nice supporting data for the female transgenic system, I do not believe this figure should be prominently featured as if this is a novelty of the current study.

      We fully agree with the reviewer's comment about the limitation of scientific novelty in Figure 6. It has an auxiliary meaning. Therefore, we transferred this figure to Supplementary material (as supplement for Figure 5).

      The ChIP experiments in Figure 7 agree with the conclusions in Figures 2 and 3 (polytene chromosome immunostaining) when it comes to X/autosome localization. I believe it would help with the flow of the paper if these experiments were combined or at least placed closer together in the narrative, rather than falling at the end.

      We moved Figure 7 (in new version – Figure 5) closer to polytene chromosome immunostaining. We agree with reviewer that this placement of the figure will make it easier to perceive the meaning of the article as a whole.

      I find Figure 8 difficult to understand, especially since the "clusters" are not annotated in the figure, but are described in the text. I struggled to follow the authors' conclusions based on these data. The authors could clarify the figure with annotations, although to be honest I do not currently see the value of this analysis/figure.

      In the new version of the article, we changed this part: we removed clusters for autosomes as difficult for understanding and non-important for this manuscript. Also we tried to place emphasis more clearly in the text of the article for clusters 1 and 2 that characterize HAS.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

      We thank the reviewer for their feedback. A challenge of studying flexible proteins is that it is often not possible to directly obtain high resolution structural data. For the case of B. subtilis RsbU, the independent experimental approaches we applied (including two unbiased genetic screens, targeted mutagenesis, SAXS, enzymology, and structure prediction, which includes evolutionary coupling) converged upon a model for activation, which we feel is well supported. Frustratingly, our attempts at determining high resolution experimental structures have been unsuccessful, which we think is due to the flexibility of the proteins revealed by our SAXS experiments. For example, we collected X-ray diffraction data from crystals of a fragment of B. subtilis RsbU containing the N-terminal domain and linker in which the linker was almost entirely disordered in the maps. We agree that doing experiments in other organisms would be valuable next steps to test the hypothesis that this coiled-coil based transduction mechanism is conserved across species, and will modify the text to differentiate this more speculative section of the manuscript. Based on this critique (and the critiques of the other reviewers), we plan to do energetic analysis of the predicted coiled coils from the enzymes we analyzed from other species and to incorporate this into the manuscript. Finally, in the manuscript, we have highlighted that this mechanism is not the only mechanism for activation of other proteins with effector domains connected to linkers, but rather one of many mechanisms (Fig 5G). The reviewer additionally made helpful suggestions about the text in detailed comments that we will incorporate as appropriate.

      Reviewer #2 (Public review):

      Summary:<br /> While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:<br /> Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.<br /> One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

      We plan to carefully review the manuscript to improve the clarity of presentation and background. In particular, we thank the reviewer for pointing out the missing information about the release of RsbT from the stressosome. We will incorporate this information into the introduction and provide an additional figure. The reviewer additionally provided detailed helpful comments that we will incorporate in the text and figures.

      Reviewer #3 (Public review):

      Summary:<br /> The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.<br /> Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.<br /> Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.<br /> Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:<br /> The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:<br /> Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      We agree with the reviewer that the SAXS data has inherent ambiguity due to the nature of the measurement. However, SAXS is one of the best techniques to directly assess conformational flexibility. Our scattering data for RsbU have multiple signatures of flexibility supporting a high confidence conclusion. While the scattering data support a reduction in flexibility for the RsbT/RsbU complex, we agree that a high resolution structure would be valuable. However the combination of the scattering data with our biochemical and genetic data supports the validity of the AlphaFold predicted model. We thank the reviewer for the suggestion of future hydrogen/deuterium exchange experiments that would be complementary, but which we feel are beyond the scope of this work.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We thank the reviewer for suggestions on how to clarify the discussion of AlphaFold models. We will decrease the emphasis on the computed models in the text and will add figures with the models colored by the pLDDT scores to aid in the interpretation.

    2. eLife assessment

      This important study combines genetic analysis, biochemistry, and structural modeling to reveal new insights into how changes in protein-protein structure activate signal transduction as part of the bacterial general stress response. The data, collected using validated and standard methods, and the interpretations are solid, although additional experimental structural evidence would strengthen the proposed model and its potential application to other systems. This manuscript, which provides multiple avenues for follow-up studies, will be of broad interest to microbiologists, structural biologists, and cell biologists.

    3. Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

    4. Reviewer #2 (Public review):

      Summary:

      While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:

      Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.

      One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

    5. Reviewer #3 (Public review):

      Summary:

      The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.

      Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.

      Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.

      Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:

      The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:

      Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

    1. eLife assessment

      This study provides potentially highly valuable new insight into the role of Fgf signalling in SUFU mutation-linked cerebellar tumors and indicates novel therapeutic interventions via inhibition of Fgf signalling. The evidence supporting the major claims, however, is at this point currently incomplete. A more robust analysis of gene expression patterns and deeper mechanistic insight would significantly enhance this study, which could have wide-ranging implications for the treatment of specific cerebellar tumors.

    2. Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

    3. Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

    4. Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      To assess FGF5 (ENSG00000138675) expression in MB tissues, we used Geo2R (Barrett et al., 2013) to analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM).

      Author response image 1.

      Comparative expression of FGF ligands, FGF5, FGF10, FGF12, and FGF19, across all MB subgroups. FGF12 expression is not significantly different, while FGF5, FGF10, and FGF19, show distinct upregulation in MBSHH subgroup (MBWNT n=70, MBSHH n=224, MBGR3 n=143, MBGR4 n=326).

      Expression of the 21 known FGF ligands were also analyzed. Many FGFs did not exhibit differential expression levels in MBSHH compared to other MB subgroups, such as with FGF12 in Figure 1. FGF5, FGF10, and FGF19 (the human orthologue of mouse FGF15) all showed specific upregulation in MBSHH compared to other MB subgroups (Author response image 1), supporting our previous observations that FGF15 is a downstream target of SHH signaling (Yabut et al., 2020), as the reviewer pointed out. However, further stratification of MBSHH patient data revealed that only FGF5 specifically showed upregulation in infants with MBSHH (MBSHHb and MBSHHg Author response image 2) indicating a more prominent role for FGF5 in the developing cerebellum and driver of MBSHH tumorigenesis in this dynamic environment.

      Author response image 2.

      Comparative expression of FGF5, FGF10, and FGF19 in different MBSHH subtypes. FGF5 specifically show mRNA relative levels above 6 in 81% of MBSHH infant patient tumors (n=80 MBSHHb and MBSHHg tumors) unlike 35% of MBSHHa  (n=65) or 0% of MBSHHd  (n=75) tumors.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      The reviewers are correct that the cerebellar foliation is severely disrupted in central and posterior lobes, as per Sudarov and Joyner (Neural Development 2007). This nomenclature may be referred to describe the regions referred in this manuscript.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice. Figure 2B show data points from n=4 mice per genotype. Figure 2C show data from n=3 mice per genotype. Figure 2D show data from n=6 mice per genotype.  Figure 3C-D show data from n=3 mice per genotype.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      The lack of foliation in Sufu-cKO cerebellum is clear particularly when visualizing the perimeter via DAPI labeling (Figure 2E). The expression area of FGF5 is also visibly larger, given that all images in Figure 2E are presented in the same scale (scale bars = 500 um). 

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      We used KI67+ expression to provide a molecular marker of regions to be quantified in both WT and Sufu-cKO sections. Quantification of labeled cells were performed in images obtained by confocal microscopy, enabling imaging of 1-2 um optical slices since Ki67 or pERK expression might not localize within the same cellular compartments. We relied on continuous DAPI nuclear staining to distinguish individual cells in each optical slice and the colocalization of of Ki67 and pERK. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO. 

      We agree with the reviewers that quantification of these phenotypes provide a solid measure of the defects. The phenotypes of Sufu:p53-dKO cerebellum are so profound requiring  in-depth characterization that will be the focus of future studies.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

      The revised manuscript will address this confusion by clearly labeling the cells and their roles in the schematic diagram.

      Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

      Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      The revised manuscript will include the following detailed explanation of the statistical analyses of the GEO dataset:

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). Sample sizes were:

      Author response table 1.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development. 

      The reviewer is correct in that hGFAP-Cre also targets other cell types, such as cerebellar glial cells, which are generated when Cre-expression has begun. It is possible that cerebellar glial cell development is also compromised in Sufu-cKO mice and may disrupt neuron-glial interaction, due to or independently of FGF signaling. In-depth studies are required to interrogate how loss of SUFU specifically affect development of cerebellar glial cells and influence their cellular interactions in the developing cerebellum.

    1. eLife assessment

      This manuscript presents a valuable new quantitative crosslinking mass spectrometry approach using novel isobaric crosslinkers. The data are solid and the method has potential for a broad application in structural biology if more isobaric crosslinking channels are available and the quantitative information of the approach is exploited in more depth.

    2. Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

    3. Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes. However, in its current form, the study some aspects of the study should be expanded upon in order for the research community to assess the true power of these isobaric crosslinkers. Specifically:

      Although the authors do mention some of the current weaknesses of their isobaric crosslinkers and qCSMS in general, more detail would be extremely helpful. Throughout the article a few key numbers (or even discussions) that would allow one to better evaluate the sensitivity (and the applicability) of the method are missing. This includes:

      (1) Throughout all the performed experiments it would be helpful to provide information on how many peptides are identified per experiment and how many have actually a crosslinker attached to it.

      (2) Of all the potential lysines that can be modified - how many are actually modified? Do the authors have an estimate for that? It would be interesting to evaluate in a denatured sample the modification efficiency of the isobaric crosslinker (as an upper limit as here all lysines should be accessible) and then also in a native sample. For example, in the MBP experiment, the authors report the change of one mono-linked peptide in samples containing maltose relative to the one not containing maltose. The authors then give a great description of why this fits to known structural changes. What is missing here is a bit of what changes were expected overall and which ones the authors would have expected to pick up with their method and why have they not been picked up. For example, were they picked up as modified by the crosslinker but not differential? I think this is important to discuss appropriately throughout the manuscript to help the reader evaluate/estimate the potential sensitivity of the method. There are passages where the authors do an excellent job doing that - for example when they mention the missed site that they expected to see in the initial the polII experiments (lines 191 to 207). This kind of "power analysis" should be heavily discussed throughout the manuscript so that the reader is better informed of what sensitivity can be expected from applying this method.

      (3) It would be very helpful to provide information on how much better (or not) the Qlinker approach works relative to label-free qCLMS. One is missing the reference to a potential qCLMS gold standard (data set) or if such a dataset is not readily available, maybe one of the experiments could be performed by label-free qCLMS. For example, one of the differential biosensor experiments would have been well suited.

    1. eLife assessment

      The authors present a valuable study exploring the interaction between JNK signaling and high sucrose feeding. The strength of evidence supporting these observations is solid, including multi-tissue transcriptomic and metabolic analyses, followed by network modeling approaches to define the organs and pathways involved. Reviewers provided several suggestions to improve the manuscript including clarifications of model and analyses, as well as explanations for within-group variations and confirming RNA-seq results at the level of metabolite processes highlighted.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, authors have investigated the effects of JNK inhibition on sucrose-induced metabolic dysfunction in rats. They used multi-tissue network analysis to study the effects of the JNK inhibitor JNK-IN-5A on metabolic dysfunction associated with excessive sucrose consumption. Their results show that JNK inhibition reduces triglyceride accumulation and inflammation in the liver and adipose tissues while promoting metabolic adaptations in skeletal muscle. The study provides new insights into how JNK inhibition can potentially treat metabolic dysfunction-associated fatty liver disease (MAFLD) by modulating inter-tissue communication and metabolic processes.

      Strengths:

      The study has several notable strengths:

      Comprehensive Multi-Tissue Analysis: The research provides a thorough multi-tissue evaluation, examining the effects of JNK inhibition across key metabolically active tissues, including the liver, visceral white adipose tissue, skeletal muscle, and brain. This comprehensive approach offers valuable insights into the systemic effects of JNK inhibition and its potential in treating MAFLD.

      Robust Use of Systems Biology: The study employs advanced systems biology techniques, including transcriptomic analysis and genome-scale metabolic modeling, to uncover the molecular mechanisms underlying JNK inhibition. This integrative approach strengthens the evidence supporting the role of JNK inhibitors in modulating metabolic pathways linked to MAFLD.

      Potential Therapeutic Insights: By demonstrating the effects of JNK inhibition on both hepatic and extrahepatic tissues, the study offers promising therapeutic insights into how JNK inhibitors could be used to mitigate metabolic dysfunction associated with excessive sucrose consumption, a key contributor to MAFLD.

      Behavioral and Metabolic Correlation: The inclusion of behavioral tests alongside metabolic assessments provides a more holistic view of the treatment's effects, allowing for a better understanding of the broader physiological implications of JNK inhibition.

      Weaknesses:

      While the study provides a comprehensive evaluation of JNK inhibitors in mitigating MAFLD conditions, addressing the following points will enhance the manuscript's quality:

      The authors should explicitly mention and provide a detailed list of metabolites affected by sucrose and JNK inhibition treatment that have been previously associated with MAFLD conditions. This will better contextualize the findings within the broader field of metabolic disease research.

      The limitations of the study should be clearly stated, particularly the lack of evidence on the effects of chronic JNK inhibitor treatment and potential off-target effects. Addressing these concerns will offer a more balanced perspective on the therapeutic potential of JNK inhibition.

      The potential risks of using JNK inhibitors in non-MAFLD conditions should be highlighted, with a clear distinction made between the preventive and curative effects of these therapies in mitigating MAFLD conditions. This will ensure the therapeutic implications are properly framed.

      The statistical analysis section could be strengthened by providing a justification for the chosen statistical tests and discussing the study's power. Additionally, a more detailed breakdown of the behavioral test results and their implications would be beneficial for the overall conclusions of the study.

    3. Reviewer #2 (Public review):

      Summary:

      Excessive sucrose is a possible initial factor for the development of metabolic dysfunction-associated fatty liver disease (MAFLD). To investigate the possibility that intervention with JNK inhibitor could lead to the treatment of metabolic dysfunction caused by excessive sucrose intake, the authors performed multi-organ transcriptomics analysis (liver, visceral fat (vWAT), skeletal muscle, and brain) in a rat model of MAFLD induced by sucrose overtake (+ a selective JNK2 and JNK3 inhibitor (JNK-IN-5A) treatment). Their data suggested that changes in gene expression in the vWAT as well as in the liver contribute to the pathogenesis of their MAFLD model and revealed that the JNK inhibitor has a cross-organ therapeutic effect on it.

      Strengths:

      (1) It has been previously reported that inhibition of JNK signalling can contribute to the prevention of hepatic steatosis (HS) and related metabolic syndrome in other models, but the role of JNK signalling in the metabolic disruption caused by excessive intake of sucrose, a possible initial factor for the development of MAFLD, has not been well understood, and the authors have addressed this point.

      (2) This study is also important because pharmacological therapy for MAFLD has not yet been established.

      (3) By obtaining transcriptomic data in multiple organs and comprehensively analyzing the data using gene co-expression network (GCN) analysis and genome-scale metabolic models (GEM), the authors showed the multi-organ interaction in not only in the pathology of MAFLD caused by excessive sucrose intake but also in the treatment effects by JNK-IN-5A.

      (4) Since JNK signalling has diverse physiological functions in many organs, the authors effectively assessed possible side effects with a view to the clinical application of JNK-IN-5A.

      Weaknesses:

      (1) The metabolic process activities were evaluated using RNA-seq results in Figure 7, but direct data such as metabolite measurements are lacking.

      (2) There is a lack of consistency in the data between JNK-IN-5A_D1 and _D2, and there is no sufficient data-based explanation for why the effects observed in D1 were inconsistent in the D2 samples.

      (3) Although it is valuable that the authors were able to suggest the possibility of JNK inhibitor as a therapeutic strategy for MAFLD, the evaluation of the therapeutic effect was limited to the evaluation of plasma TG, LDH, and gene expression changes. As there was no evaluation of liver tissue images, it is unclear what changes were brought about in the liver by the excessive sucrose intake and the treatment with JNK-IN-5A.

    1. eLife assessment

      Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function resulting in changes in gene expression and splicing of target mRNAs. This study developed a sensitive and robust sensor for TDP-43 activity that should impact the field's ability to monitor whether TDP-43 is functional or not. Though limited to cell culture, the evidence presented is convincing and is the first demonstration that a GFP on/off system can be used to assess TDP-43 mutants as well as loss of soluble TDP-43. The findings are valuable and may represent a novel tool to investigate TDP-43-associated disease mechanisms.

    2. Reviewer #1 (Public review):

      Summary:<br /> The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:<br /> The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:<br /> Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices. 1. Testing the sensor in other cell lines 2. Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP-43.

      Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

    3. Reviewer #2 (Public review):

      Summary:<br /> The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:<br /> In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFP-fluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:<br /> Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed. The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs. Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP-43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified. Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

    4. Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed.

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices. 1. Testing the sensor in other cell lines 2. Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP-43.

      Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank Reviewer #1 for their detailed feedback. In response, we will investigate the function of CUTS in neuronal cells and evaluate how a modest reduction in TDP-43 levels affects the splicing of physiologically relevant TDP-43-regulated cryptic exons within these cells (eg. STMN2, UNC13A, etc…).

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFP-fluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed. The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs. Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP-43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified. Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      We thank Reviewer #2 for their constructive evaluation of our study. In response, we will assess CUTS in human neuronal cells, as also recommended by Reviewer #1. Additionally, we will incorporate an analysis of CUTS using flow cytometry to provide quantitative measurements of GFP signal. We agree that investigating how CUTS responds to stressors affecting TDP-43 function would be a valuable addition (eg. MG132), and we will include this data in the revisions to the study.

      We also appreciate the feedback on our figures and will work to enhance their clarity, incorporating the Reviewer’s suggestions. Specifically, we will split Figure 2D and 2G into multiple plots and ensure clearer labeling of the image panels in Figures 2A and 4B.

      Regarding the comment on the 5FL data, we believe this occurrence can be explained by existing literature, and we will address this directly in the discussion section of the manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      We thank Reviewer #3 for their time and thoughtful assessment of our manuscript. We will address all their recommendations, including expanding the discussion on the CE sequences utilized in the CUTS sensor and exploring the potential utility of the CUTS sensor in alternative disease-relevant systems.

    1. eLife assessment

      This work describes how the toxin-antitoxin (TA) system, which uses the cyclic di-GMP as an antitoxin, controls both the persistence of antibiotics linked to biofilms and the integrity of the bacterial genome. The authors present solid evidence linking cyclic di-GMP and the toxin HipH. The work is valuable because it establishes the relationship between bacterial persistence and biofilm resilience, which lays a strong basis for future research on the formation of bacterial biofilms and antibiotic resistance.

    2. Reviewer #2 (Public Review):

      Summary:

      Hebin et al reported a fascinating story about antibiotic persistence in the biofilms. First, they set up a model to identify the increased persisters in the biofilm status. They found that the adhesion of bacteria to the surface leads to increased c-di-GMP levels, which might lead to the formation of persisters. To figure out the molecular mechanism, they screened the E.coli Keio Knockout Collection and identified the HipH. Finally, the authors used a lot of data to prove that c-di-GMP not only controls HipH over-expression but also inhibits HipH activity, though the inhibition might be weak.

      Strengths:

      They used a lot of state-of-the-art technologies, such as single-cell technologies as well as classical genetic and biochemistry approaches to prove the concept, which makes the conclusions very solid. Overall, it is a very interesting and solid story that might attract diverse readers working with c-di-GMP, persisters, and biofilm.

      Comments on the revised version:

      All my concerns have been addressed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This preprint explores the involvement of cyclic di-GMP in genome stability and antibiotic persistence regulation in bacterial biofilms. The authors proposed a novel mechanism that, due to bacterial adhesion, increases c-di-GMP levels and influences persister formation through interaction with HipH. While the work may provide useful insights that could attract researchers in biofilm studies and persistence mechanisms, the main findings are inadequately supported and require further validation and refinement in experimental design.

      We sincerely thank eLife for the through assessment of our manuscript. We appreciate the constructive criticism and see it as an opportunity to strengthen our research. In response to the reviewers' comments and suggestions, we have made significant improvements to our study. We have refined our experimental design and conducted additional experiments to provide more robust evidence supporting our findings. We believe that with these additional experiments and refinements, our study provides robust evidence for this novel mechanism, contributing significantly to the fields of biofilm research and bacterial persistence.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors propose a UPEC TA system in which a metabolite, c-di-GMP, acts as the AT with the toxin HipH. The idea is novel, but several key ideas are missing in regard to the relevant literature, and the experimental design is flawed. Moreover, they are absolutely not studying persister cells as Figure 1b clearly shows they are merely studying dying cells since no plateau in killing (or anything close to a plateau) was reached. So in no way has persistence been linked to c-di-GMP. Moreover, I do not think the authors have shown how the c-di-GMP sensor works. Also, there is no evidence that c-di-GMP is an antitoxin as no binding to HipH has been shown. So at best, this is an indirect effect, not a new toxin/antitoxin system as for all 7 TAs, a direct link to the toxin has been demonstrated for antitoxins.

      Thank you for your constructive comments on our manuscript. Your insights have prompted us to revisit our data and experimental design, leading to significant improvements in our study.

      (1) Clarification on Persister Cell Detection: We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented two key modifications: shortening the sampling interval and extending the antibiotic treatment duration. ​These adjustments have resulted in an updated killing curve that now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing, as illustrated in Response Figure 1.​ This refined pattern aligns with established characteristics of persister cell behavior in antibiotic tolerance studies, providing a more accurate representation of the persister population dynamics in our experimental system. We believe these methodological enhancements significantly improve the reliability and interpretability of our results, offering a clearer insight into the persister cell phenomenon under investigation.

      (2) Validation of c-di-GMP Sensor: We appreciate your point about the c-di-GMP sensor. The c-di-GMP sensor, developed by Howard C. Berg's team, is specifically designed to detect relative intracellular concentrations of c-di-GMP in Escherichia coli cells. This capability is crucial for understanding the dynamic regulation of c-di-GMP during bacterial responses to environmental stimuli. We have expanded our explanation of the sensor's detection mechanism in lines 138-146 of the manuscript, detailing how it functions to reflect changes in c-di-GMP levels within the cells accurately. The mechanism operates through a series of signaling events that are initiated when c-di-GMP binds to the sensor, leading to measurable outputs that correlate with intracellular concentrations. Additionally, we have provided a schematic chart in Figure S1B to visually support our description regarding the sensor. This figure demonstrates the sensor's responsiveness and specificity in detecting fluctuations in c-di-GMP levels, effectively linking the signaling molecule to cellular behavior. We hope these additions clarify the role of the c-di-GMP sensor in our research and address your concerns regarding its functionality.​

      (3) HipH and c-di-GMP Interaction: Our pull-down experiments presented in Figure 5A-E provide robust and compelling evidence for a direct physical interaction between HipH and c-di-GMP, and the effects of their interaction reminiscent of toxin-antitoxin systems. Yet we acknowledge c-di-GMP is not a traditional antitoxin since it is not genetically linked to HipH. We have revised our terminology to "TA-like system" to reflect this difference more accurately.

      Weaknesses:

      (1) L 53: biofilm persisters are no different than any other persisters (there is no credible evidence of any different persister cells) so this reviewer suggests changing 'biofilm persisters' to 'persisters' throughout the text.

      Thank you for your thoughtful consideration. Upon careful consideration of the current scientific literature, we agree that there is no substantial evidence supporting a distinct category of persister cells specific to biofilms. We have systematically replaced 'biofilm persisters' with 'persisters' throughout the manuscript​.

      (2) L 51: persister cells do not mutate and, once resuscitated, mutate like any other growing cell so this sentence should be deleted as it promotes an unnecessary myth about persistence.

      We sincerely appreciate your astute observation regarding the inaccuracy in line 51. We have removed the sentence in question from line 51​. And we also have thoroughly reviewed the entire manuscript to ensure no similar misconceptions are present elsewhere in the text.

      (3) L 69: please include the only metabolic model for persister cell formation and resuscitation here based on single cells (e.g., doi.org/10.1016/j.bbrc.2020.01.102 , https://doi.org/10.1016/j.isci.2019.100792 ); otherwise, you write as if there are no molecular mechanisms for persistence/resuscitation.

      Thank you for your valuable suggestion. We appreciate the opportunity to enhance the scientific context of our manuscript. We have added a brief explanation of how ppGpp mediates ribosome dimerization, leading to persistence, and how its degradation triggers resuscitation [1-3] (lines 68-74). We have described the role of cAMP-CRP in regulating persistence through its effects on metabolism and stress responses [4, 5] (lines 74-78). We also explore potential interactions or synergies between our proposed mechanisms and these established metabolic models [6] (lines 383-409). We believe this revision significantly enhances our manuscript by providing a more accurate representation of the current state of knowledge in the field and demonstrating how our work builds upon and contributes to existing models of bacterial persistence.

      (4) The authors should cite in the Intro or Discussion that others have proposed similar novel TAs including a ppGpp metabolic toxin paired with an enzymatic antitoxin SpoT that hydrolyzes the toxin (http://dx.doi.org/10.1016/j.molcel.2013.04.002).

      We are grateful for your expertise in pointing out this crucial reference. We sincerely appreciate your suggestion to include the reference to previously proposed novel toxin-antitoxin (TA) systems, particularly the ppGpp-SpoT system [6]. In light of this reference, we have expanded our discussion to include: 1) A brief overview of the ppGpp-SpoT system as a novel TA-like mechanism. 2) Comparisons between the ppGpp-SpoT system and our findings on the HipH-c-di-GMP interaction. 3) Reflections on how these systems challenge and expand traditional definitions of TA systems (lines 383-409). We believe this addition significantly enhances the context and strengthens the rationale for considering the HipH-c-di-GMP interaction as a TA-like system. Thank you for your valuable input in helping us situate our research within the broader landscape of TA system biology.

      (5) Figure 1b: there are no results in this paper related to persister cells. Figure 1b simply shows dying cells were enumerated. Hence, the population of stressed cells increased, not 'persister cells' (Figure 1f), in the course of these experiments.

      We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented 1) Shortened sampling interval: We have reduced the interval between measurements to one hour. 2) Extended sampling duration: The total duration of sampling has been increased to 6 hours (Response Figure 1). The updated killing curve now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing: 1) Initial rapid decline: From 0-1hours, we observe a steep decrease in bacterial survival (slope ≈ -3~-1.8); 2) Slower decline phase: From 4.5-6 hours, the rate of decline is markedly reduced (slope ≈ -0.17~-0.06). This pattern aligns more closely with established characteristics of persister cell behavior in antibiotic tolerance studies.

      (6) Figure S1: I see no evidence that the authors have shown this c-di-GMP detects different c-di-GMP levels since there appears to be no data related to varying c-di-GMP concentrations with a consistent decrease. Instead, there is a maximum. What are the concentration of c-di-GMP on the X-axis for panels C, D, and E? How were c-di-GMP levels varied such that you know the c-di-GMP concentration?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70E). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (7) The viable portion of the VBNC population are persister cells so there is no reason to use VBNC as a separate term. Please see the reported errors often made with nucleic acid staining dyes in regard to VBNCs.

      We appreciate the opportunity to clarify the distinction between VBNC cells and persister cells in our manuscript. It is essential to recognize that VBNC cells and persister cells represent two fundamentally different states of bacterial dormancy. While both may exhibit viability under certain conditions, persister cells are characterized by their ability to resuscitate and grow when environmental conditions become favorable [8]. In contrast, VBNC cells are in a deep dormant state where they cannot be revived through normal culture conditions [9, 10]. This distinction is critical for accurately representing bacterial survival strategies and population dynamics, which is why we maintain the use of the term VBNC separately from persister cells. We have added related references in lines 259.

      Regarding the reported errors associated with nucleic acid staining dyes for identifying VBNC cells, we acknowledge that these methods can exhibit limitations. Specifically, nucleic acid stains may fail to reliably differentiate between metabolically active and inactive cells, leading to inaccuracies in quantifying the true VBNC population [11]. In our study, we have opted to utilize propidium iodide (PI) staining to assess cell viability more accurately, as it effectively distinguishes dead cells from viable cells based on membrane integrity [12]. By employing this methodology, we ensure a more precise estimation of the VBNC proportion without conflating it with persister cell dynamics.

      Reviewer #2 (Public Review):

      Summary:

      Hebin et al reported a fascinating story about antibiotic persistence in the biofilms. First, they set up a model to identify the increased persisters in the biofilm status. They found that the adhesion of bacteria to the surface leads to increased c-di-GMP levels, which might lead to the formation of persisters. To figure out the molecular mechanism, they screened the E.coli Keio Knockout Collection and identified the HipH. Finally, the authors used a lot of data to prove that c-di-GMP not only controls HipH over-expression but also inhibits HipH activity, though the inhibition might be weak.

      Thank you for your insightful summary of our research. We greatly appreciate your thoughtful consideration of our work.

      Strengths:

      They used a lot of state-of-the-art technologies, such as single-cell technologies as well as classical genetic and biochemistry approaches to prove the concept, which makes the conclusions very solid. Overall, it is a very interesting and solid story that might attract diverse readers working with c-di-GMP, persisters, and biofilm.

      Weaknesses:

      (1) Is HipH the only target identified by screening the E. coli Keio Knockout Collection?

      We appreciate your inquiry about our screening process and the identification of HipH. We did not screen the entire E. coli Keio Knockout Collection. Our approach was more targeted, focusing on mutants relevant to enzyme activity regulation. We selected specific mutants based on their potential involvement in c-di-GMP-mediated regulatory pathways. This focused approach allowed us to efficiently identify candidates likely to be involved in persister formation. Among the screened mutants, HipH emerged as a significant hit. Its identification was particularly noteworthy due to its known role in persister formation and its potential as a regulatory target of c-di-GMP. We acknowledge that our targeted approach may have overlooked other potential candidates. We are considering a more comprehensive screening approach in future studies to identify additional targets.

      (2) Since the story is complicated, a diagrammatic picture might be needed to illustrate the whole story. And the title does not accurately summarize the novelty of this study.

      Thank you for your valuable feedback. We fully agree with your assessment that a visual representation would greatly enhance the clarity of our complex findings. In response to your suggestion, we have added Response Figure 2 (Fig. 6 in revised manuscript, lines 976-981) to our manuscript. This new figure provides a comprehensive visual summary of the key processes and mechanisms uncovered in our study. This graphic summary provides a clear overview of the interconnected nature of surface adhesion, c-di-GMP signaling, and HipH regulation. It also highlights the complex role of c-di-GMP in persister formation and offers readers a visual aid to better understand the molecular mechanisms underlying our findings.

      We sincerely appreciate your thoughtful comment regarding the title and its reflection of the study's novelty. ​After careful consideration, we believe that our original title adequately captures the essence and significance of our research.​ We have strived to ensure that it accurately represents the scope and novelty of our work while maintaining clarity and conciseness. Nevertheless, we value your input and thank you for taking the time to provide this feedback, as it encourages us to critically evaluate our presentation.

      (3) The ratio of mVenusNB to mScarlet-I (R) negatively correlates with the concentration of c-di-GMP. Therefore, R-1 demonstrates a positive correlation with the concentration of c-di-GMP. Is this method validated with other methods to quantify c-di-GMP, or used in other studies?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (4) References are missing throughout the manuscript. Please add enough references for every conclusion.

      We appreciate your feedback regarding the references in our manuscript. We acknowledge the importance of proper citation to support our conclusions and provide context for our work. ​In response to your comment, we have conducted a comprehensive review of our manuscript and have significantly enhanced our referencing throughout.​ We have added appropriate citations to support each key statement and conclusion presented in our study. These additional references provide a robust foundation for our findings and place our work within the broader context of the field. The complete list of all references, including the newly added ones, can be found at the end of this response letter as well as in the revised manuscript.

      (5) The novelty of this study should be clearly written and compared with previous references. For example, is it the first study to report the mechanism that the adhesion of bacteria to the surface leads to increased persister formation?

      We sincerely appreciate the opportunity to highlight and elaborate the novelty of our research. This study provides novel insights into the relationship between bacterial adhesion to surfaces and the subsequent increase in persister cell formation, which has not been explicitly detailed in previous literature. While existing research has established that biofilms typically harbor higher numbers of persister cells, this investigation not only corroborates that finding but also elucidates the mechanisms through which surface adhesion contributes to this phenomenon.

      Past studies have predominantly focused on the general characteristics of persister cells and their role in biofilm resilience and antibiotic tolerance without specifically addressing the mechanistic link between adhesion and persister formation [13, 14]. For instance, previous work has shown that surface attachment leads to changes in metabolic activity and signaling pathways within bacterial cells, which could promote persistence, but it has not definitively established a causal relationship between adhesion and increased persister formation. Our study highlights that the elevation of cyclic di-GMP levels after surface adhesion triggers a cascade of physiological changes that significantly enhance the formation of persister cells. In particular, we report that adhesion-induced signaling pathways promote dormancy and tolerance to antibiotics, marking an important advancement from the previous understanding that treated persister cells might arise from random phenotypic variation during biofilm development. we have expanded our discussion in lines 366-381.

      In summary, we believe this study stands as one of the first to clearly delineate the mechanism by which bacterial adhesion leads to increased persister formation, providing a valuable contribution to the current understanding of bacterial persistence and biofilm ecology. Thus, we can assert that our findings are not only novel but also essential for informing future research and therapeutic strategies aimed at managing bacterial infections.

      (6) in vitro DNA cleavage assay. Why not use bacterial genomic DNA to test the cleaving of HipH on the bacterial genome?

      Thank you for your feedback regarding our experimental approach. The decision of not directly using genomic DNA in our experiments was made after careful consideration. The high molecular weight of genomic DNA, which presents significant challenges in handling and analysis. The difficulty in extracting intact genomic DNA, which could potentially compromise the integrity of our results. The challenges associated with electrophoretic separation of such large DNA molecules, which could limit our ability to accurately interpret the data.

      Instead, following established practices in molecular biology research and drawing from similar studies in the field [15-17], we opted to use plasmids as model DNA for our experiments.​ This approach offers several advantages: Plasmids are smaller and more manageable, making them easier to manipulate in laboratory conditions; They can be more readily extracted in intact form, ensuring the quality of our experimental material; Plasmid DNA is more amenable to electrophoretic separation, allowing for clearer and more precise analysis. Despite their smaller size, plasmids retain many of the key characteristics of genomic DNA that are relevant to our study. We believe this approach provides a robust and reliable model for our research while overcoming the practical limitations associated with genomic DNA. It allows us to investigate the fundamental principles we're interested in, while maintaining experimental feasibility and data integrity. We have added related references in lines 314 and 599.

      (7) C-di-GMP -HipH is not a TA, it does not fit in the definition of the TA systems. You can say C-di-gmp is an antitoxin based on your study, but C-di-gmp -HipH is not a TA pair.

      We appreciate your insightful feedback regarding the classification of the c-di-GMP-HipH interaction. We acknowledged that while our study suggests c-di-GMP may function as an antitoxin to HipH, the c-di-GMP-HipH pair does not constitute a classical TA system due to the lack of genetic linkage. We have replaced the term "TA system" with "TA-like system" when referring to the c-di-GMP-HipH interaction. This more accurately reflects the nature of their relationship while acknowledging that it differs from traditional TA systems.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Either indent or skip a line to indicate a new paragraph; there is no need to do both.

      Thank you for your feedback regarding the formatting of our manuscript. We have revised the formatting throughout the main text by using a single blank line to separate paragraphs, without indentation.

      (2) L 77: need to define 'c-di-GMP' without using another abbreviation; please write '3,5-cyclic diguanylic acid', etc.

      Thank you for your valuable feedback regarding the proper introduction of abbreviations in our manuscript. We have revised line 86 to introduce the full name of c-di-GMP as "3,5-cyclic diguanylic acid". Following this initial introduction, we consistently use the abbreviation "c-di-GMP" throughout the rest of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      This is a fascinating story, but the title and the manuscript need careful revision to make it more clear. The novelty and logic are not very easy to follow.

      (1) Figure 1B, " h" is missing

      We sincerely thank you for your attentive review and for pointing out the missing "h" in Figure 1B. We have carefully reviewed and revised the figure legend in Figure 1B.​ The unit of time has been corrected to include "h" (hours) where appropriate, ensuring consistency and accuracy throughout the figure.

      (2) Line 222, the in vivo mice model should be cited with the reference.

      Thank you for the reminding. We have cited the following reference related to the mice model (line 231).

      Pang Y, et al., (2022) Bladder epithelial cell phosphate transporter inhibition protects mice against uropathogenic Escherichia coli infection. Cell reports 39: 110698

      References

      (1) Wood, T.K. and S. Song, Forming and waking dormant cells: The ppGpp ribosome dimerization persister model. Biofilm, 2020. 2: p. 100018.

      (2) Song, S. and T.K. Wood, ppGpp ribosome dimerization model for bacterial persister formation and resuscitation. Biochem Biophys Res Commun, 2020. 523(2): p. 281-286.

      (3) Wood, T.K., S. Song, and R. Yamasaki, Ribosome dependence of persister cell formation and resuscitation. J Microbiol, 2019. 57(3): p. 213-219.

      (4) Niu, H., J. Gu, and Y. Zhang, Bacterial persisters: molecular mechanisms and therapeutic development. Signal Transduct Target Ther, 2024. 9(1): p. 174.

      (5) Mok, W.W., M.A. Orman, and M.P. Brynildsen, Impacts of global transcriptional regulators on persister metabolism. Antimicrob Agents Chemother, 2015. 59(5): p. 2713-9.

      (6) Amato, S.M., M.A. Orman, and M.P. Brynildsen, Metabolic control of persister formation in Escherichia coli. Mol Cell, 2013. 50(4): p. 475-87.

      (7) Vrabioiu, A.M. and H.C. Berg, Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proc Natl Acad Sci U S A, 2022. 119(6).

      (8) Liu, J., et al., Viable but nonculturable (VBNC) state, an underestimated and controversial microbial survival strategy. Trends Microbiol, 2023. 31(10): p. 1013-1023.

      (9) Pan, H. and Q. Ren, Wake Up! Resuscitation of Viable but Nonculturable Bacteria: Mechanism and Potential Application. Foods, 2022. 12(1).

      (10) Ayrapetyan, M., T. Williams, and J.D. Oliver, Relationship between the Viable but Nonculturable State and Antibiotic Persister Cells. J Bacteriol, 2018. 200(20).

      (11) Zhao, S., et al., Absolute Quantification of Viable but Nonculturable Vibrio cholerae Using Droplet Digital PCR with Oil-Enveloped Bacterial Cells. Microbiol Spectr, 2022. 10(4): p. e0070422.

      (12) Zhao, S., et al., Enumeration of Viable Non-Culturable Vibrio cholerae Using Droplet Digital PCR Combined With Propidium Monoazide Treatment. Front Cell Infect Microbiol, 2021. 11: p. 753078.

      (13) Pan, X., et al., Recent Advances in Bacterial Persistence Mechanisms. Int J Mol Sci, 2023. 24(18).

      (14) Patel, H., H. Buchad, and D. Gajjar, Pseudomonas aeruginosa persister cell formation upon antibiotic exposure in planktonic and biofilm state. Sci Rep, 2022. 12(1): p. 16151.

      (15) Maki, S., et al., Partner switching mechanisms in inactivation and rejuvenation of Escherichia coli DNA gyrase by F plasmid proteins LetD (CcdB) and LetA (CcdA). J Mol Biol, 1996. 256(3): p. 473-82.

      (16) Hockings, S.C. and A. Maxwell, Identification of four GyrA residues involved in the DNA breakage-reunion reaction of DNA gyrase. J Mol Biol, 2002. 318(2): p. 351-9.

      (17) Chan, P.F., et al., Structural basis of DNA gyrase inhibition by antibacterial QPT-1, anticancer drug etoposide and moxifloxacin. Nat Commun, 2015. 6: p. 10048.

    1. eLife assessment

      This important study evaluates the outcomes of a single-institution pilot program designed to provide graduate students and postdoctoral fellows with internship opportunities in areas representing diverse career paths in the life sciences. The data convincingly show the benefit of internships to students and postdocs, their research advisors, and potential employers, without adverse impacts on scientific productivity. This work will be of interest to multiple stakeholders in graduate and postgraduate life sciences education and should stimulate further research into how such programs can best be broadly implemented.

    2. Reviewer #2 (Public review):

      Summary:<br /> The authors describe five year outcomes of an internship program for graduate students and postdoctoral fellows at their institution spurred by pilot funding from an NIH BEST grant. They hypothesized that such a program would be beneficial to interns, internship hosts, and research advisors. The mixed methods study used surveys and focus groups to gather qualitative and quantitative data from the stakeholder groups, and the authors acknowledge that limitation that the study subjects were self-selected and also had research advisors who agreed to allow them to participate. Thus the generally favorable outcomes may not be applicable to students such as those who are struggling in the lab and/or lack career focus or supportive research advisors. Nonetheless, the overall finding support the hypothesis and also suggest additional benefits, including in some cases positive impact for the lab, improved communication between the intern and their research advisor, and an advantage for recruitment of students to the institution. The data refute one of the principle concerns of research advisors: that by taking students out of the lab, internships reduce individual and overall lab productivity. Students who did internships were significantly less likely to pursue postdoctoral fellowships before entering the biomedical workforce and were more likely to have science-related careers versus research careers than control students who did not do internships, although the study design cannot determine whether this is a causal relationship.

      Strengths:<br /> (1) Sample size is good (123 internships).

      (2) Response rate is high, minimizing potential bias.

      (3) The internship program is well described. Outcomes are clearly defined.

      (4) Methods and statistical analyses appear to be appropriate (although I am not expert in mixed methods).

      (5) "Take-home" lessons for institutions considering implementing internship programs are clearly stated.

      Appraisal:<br /> Overall the authors achieve their aims of describing outcomes of an internship program for graduate career development and offering lessons learned for other institutions seeking to create their own internship programs.

      Impact:<br /> The paper will be very useful for other institutions to dispel some of the concerns of research advisers about internships for PhD students (although not necessarily for postdoctoral fellows). In the long run, wider adoption of internships as part of PhD training will depend not only on faculty buy-in but also on availability of resources and changes to the graduate school funding model so that such programs are not viewed as another "unfunded mandate" in graduate education. Perhaps industry will be motivated to support internships by the positive outcomes for hosts reported in this paper. Additionally, NIH could allow a certain amount of F, T, or even RPG funds to be used to support internships for purposes of career development.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment<br /> This important study evaluates the outcomes of a single-institution pilot program designed to provide graduate students and postdoctoral fellows with internship opportunities in areas representing diverse career paths in the life sciences. The data convincingly show the benefit of internships to students and postdocs, their research advisors, and potential employers, without adverse impacts on scientific productivity. This work will be of interest to multiple stakeholders in graduate and postgraduate life sciences education and should stimulate further research into how such programs can best be broadly implemented.

      Thank you for your assessment. We agree that sharing our process for creating this internship program with the wider higher education community is important and we hope it will spur establishment of new programs at other institutions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study was to determine whether short (1 month) internships for biomedical science trainees (mostly graduate students but some post-docs) were beneficial for the trainees, their mentors, and internship hosts. Over a 5 year period, the outcomes of trainees who completed internships were compared with peers who did not. Both quantitative results in terms of survey responses and qualitative results obtained from discussion groups were provided. Overall, the data suggest that internships aid graduate students in multiple ways and do not harm progress on dissertation projects. 'Buy-in' from mentors and prospective mentors appeared to increase over time, and hosts also gained from the contributions of the interns even in a short time period. While the program also appeared valuable for post-doctoral trainees, it was less favorably considered by post-doc mentors.

      Thank you for such a positive and concise overview of this paper.

      Strengths:

      The internship program that was examined here appears to have been very well designed in terms of availability to students, range of internship offerings, length of time away from PhD lab, and assessments.

      Having a built-in peer control group of graduate students who did not do internships was valuable for much of the quantitative analyses. However, as the authors acknowledge, those who did opt for internships are a self-selected group who may have character traits that would help them overcome the potential negative impacts of the internship.

      The quantitative data is convincing and addresses important considerations for all stakeholders.

      The manuscript is well-constructed to individually address the impact of the program on each set of stakeholders, while also showcasing areas of mutual benefit.

      The discussion of challenges and limitations, from the perspectives of participating stakeholders, program leaders, and also institutions, is comprehensive and very thoughtful.

      Thank you for noting these strengths in experimental design, control group, and manuscript format.

      Weaknesses:

      The qualitative data that resulted from the 'focus groups' of faculty mentors was somewhat difficult to evaluate given the very limited number of participants (n=7).

      Thank you for pointing out the potential limitations of a small sample size. One reason we selected a qualitative approach to focus group data analysis in our experimental design was to supplement our larger quantitative analyses with faculty advisors. A benefit of relying on qualitative methods is that saturation of a representative set of themes can be reached even with a limited number of participants. This is particularly true when a homogenous sample is used, such as faculty members in the biomedical sciences (Guest, et al. 2006). We have added the following sentences at line 188 in the text to expand on the faculty focus groups:

      “A group of faculty advisors in a range of disciplines and demographics, all of whom were active mentors with extensive training experience were invited to participate in the focus groups. Seven faculty advisors participated in the Year 1 focus group and 5 of those same 7 participated in Year 5. Saturation can occur with as little as six interviews in homogeneous samples (Guest et al. 2006) such as our biomedical faculty research advisors at a single institution.”

      In the original analysis, we increased the generalizability of our findings by gathering faculty opinions and feedback using multiple methods. For example, faculty post internship surveys responses were returned by 75 faculty members over a 5-year period, which represents a 61% response rate. (Faculty post internship surveys results are shown in Figure 1, panels v-x and Figure 4, panels i-t.) In addition, the survey gauging general faculty advisor support for the program (Figure 3); which was administered two times, 4 years apart; gathers the opinions of 115 advisors in year 1 and 122 advisors in year 4. Thus, the faculty focus group surveys were only one of 3 ways that faculty input was gathered. In sum, while the small number of faculty mentors who participated in the focus groups has the potential to introduce bias, we made a conscious decision to use a mixed methods approach to expand beyond one sample to increase the generalizability of our results. However, to acknowledge the complexity of faculty advisor views on internships, we have noted the need to further study faculty advisor support for internships in broader samples as a future direction. This is the new wording we included at line 788:

      “Other future studies could probe faculty advisor support for internships at institutions beyond our own since training culture and faculty perspectives are influenced by many factors and vary from institution to institution.”

      Overall, the data support the authors' conclusions with respect to the utility of internship programs for all stakeholders. As the authors note, the data relate to a specific program where internship length was defined, costs were covered by a grant or institutional funding, and there were multiple off-site internship hosts available. Thus, the results here may not replicate for other programs with different criteria.

      Thank you for noting these advantages that contributed to the success of this program. We agree that other institutions will encounter unique challenges when implementing their own internship program and have addressed some of these limitations in our discussion section. In the Discussion section of the paper, we outline considerations and review lessons learned in an effort to help others know what aspects of the program might or might not work in distinct situations or locations. We also point the reader to distinct internship models at other institutions in the hope that any university hoping to provide their trainees with internship opportunities can benefit from the collective experience of the relatively few programs that have found sustainable ways to accomplish this.  

      This work provides a valuable assessment of how relatively short internships can impact graduate students, both in terms of their graduate tenure and in their decision-making for careers post-graduation. As more graduate programs are heeding calls from funding agencies and professional societies to increase knowledge about, and familiarity with, multiple career paths beyond academia for PhD students, there is a need to evaluate the best ways to accomplish that goal. Hands-on internships are valuable across many spheres so it makes sense that they would be for life science graduates too. However, the fear that time-to-degree and/or productivity would be negatively impacted is important to acknowledge. By providing clear data that this is not the case, these investigators have increased the likelihood that internships could be considered by more institutions. The one big drawback, and one that the authors discuss at some length, is the funding model that could enable internship programs to be used more widely.

      Thank you for providing suggestions to improve the generalizability of our results. We agree that finding a sustainable source of funding for internship programs, and the staff who direct them, is a primary obstacle to implementing these programs more widely. We provide some ideas and funding models for other institutions to consider, and future directions could examine internships that are un-funded or funded primarily by fellowships from supportive granting agencies. Accordingly, we have added the following text to future directions at Line 755:

      “We acknowledge the need for future studies to evaluate the feasibility and outcomes of internship programs funded via different models to see if faculty support and student outcomes would be comparable under different models.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe five-year outcomes of an internship program for graduate students and postdoctoral fellows at their institution spurred by pilot funding from an NIH BEST grant. They hypothesized that such a program would be beneficial to interns, internship hosts, and research advisors. The mixed methods study used surveys and focus groups to gather qualitative and quantitative data from the stakeholder groups, and the authors acknowledge the limitation that the study subjects were self-selected and also had research advisors who agreed to allow them to participate. Thus the generally favorable outcomes may not be applicable to students such as those who are struggling in the lab and/or lack career focus or supportive research advisors. Nonetheless, the overall findings support the hypothesis and also suggest additional benefits, including in some cases positive impact for the lab, improved communication between the intern and their research advisor, and an advantage for recruitment of students to the institution. The data refute one of the principal concerns of research advisors: that by taking students out of the lab, internships reduce individual and overall lab productivity. Students who did internships were significantly less likely to pursue postdoctoral fellowships before entering the biomedical workforce and were more likely to have science-related careers versus research careers than control students who did not do internships, although the study design cannot determine whether this was due to selection bias or to the internship.

      Thank you for such a positive and concise overview of this paper.

      Strengths:

      (1) The sample size is good (123 internships).

      (2) The internship program is well described. Outcomes are clearly defined.

      (3) Methods and statistical analyses appear to be appropriate (although I am not an expert in mixed methods).

      (4) "Take-home" lessons for institutions considering implementing internship programs are clearly stated.

      Thank you for enumerating these strengths. We also hope that the sample size, positive outcomes, and take-home lessons will be of benefit to other institutions.

      Weaknesses:

      (1) It is possible that interns, hosts, and research advisers with positive experiences were more likely to respond to surveys than those with negative experiences. The response rate and potential bias in responses should be discussed in the Results, not just given in a table legend in Methods.

      Thank you for noting this oversight. We were pleased that throughout our study, the majority of interns, faculty advisors and internship hosts responded to the surveys. As suggested, we have included the following text at line 132 in the first paragraph of the results section:

      “The response rate for the 123 survey invitations sent to interns and their current research advisors and internship hosts ranged from 61% for research advisors to 73% for hosts, and about 66% for interns (averaging pre and post survey responses). In addition to quantitative surveys, qualitative themes and exemplars were collected from focus groups.”

      (2) With regard to the biased selection of participants, do the authors know how many subjects requested but were not permitted to do internships?

      We too were concerned about trainees who would not be able to secure their PI’s support to participate in an internship.  Accordingly, as part of our program design and evaluation, in the inaugural year of the program our external evaluator, Strategic Evaluations, Inc., administered a survey to graduate students and postdocs who registered for an internship information session or who started, but did not complete the application. Registrants were asked about their decision to complete an application, their experience completing the application if they chose to do so, and the likelihood that they would apply to the program next year. Of the respondents, only 9% indicated that lack of PI support prevented them from participating (n=53 respondents). Hence while we cannot completely rule out PI support as a barrier, only a small percentage of trainees reported this as a barrier despite a robust response rate (43%).  A second line of evidence that there was not a large number of students who were prevented from doing an internship by their research advisor is the high faculty approval rating of the program which was gathered in both year 1 and year 4 of the program (see figure 3). These two independent lines of evidence diminish our concern that faculty advisor resistance was a significant barrier to participation.

      (3) While the authors mention internships in professional degree programs in fields such as law and business, some mention of internship practices in non-biomedical STEM PhD programs such as engineering or computer science would be helpful. Is biomedical science rediscovering lessons learned when it comes to internships?

      Excellent point. We noted that internships are common in non-biomedical STEM masters and PhD programs, but we did not list experiential rotations and internships that are common in nursing, engineering, computer science and other such programs. We agree that many lessons learned from internships in all fields are transferable to the biomedical fields, and we also strongly believe that findings there need to be replicated in the biomedical sciences because of the unique funding model, incentive structure, and apprentice structure of the biomedical training. In response to this critique, we added the following text to the manuscript at line 724:

      “Internships are ubiquitous in many other professional training programs such as law, business, nursing, computer science, and engineering programs (Van Wart, O’Brien et al, 2020).”

      (4) Figure 1 k, l - internships did not appear to change career goals, but are the 76% who agreed pre-internship the same individuals as the 75% who agreed post-internship? What percentage gave discordant responses?

      While our data cannot directly address this question as collected, we surmise that because internships in this program usually occur in the final 12-18 months of training and because there is an emphasis on the internship being a skill-building and not necessarily a career exploration initiative, therefore we were not surprised to see that the internship doesn’t radically alter many trainees’ career plans. One limitation of our study is that career goals were defined by pre-surveys at different timepoints depending on what stage of training an individual (whether control or internship participant) happened to be at during the administration of the baseline survey. We know from previous work that career goals often shift during training (see Roach and Sauermann, 2017 PLOS One, https://doi.org/10.1371/journal.pone.0184130, and Gibbs et al, 2014, PLOS One, https://doi.org/10.1371/journal.pone.0114736), so the point at which career interests are gathered makes a difference in this kind of analysis. Hence, we have expanded our discussion of this limitation to better acknowledge this critique beginning at Line 319.

      “Because of the variable timing between pre-internship career interest surveys among interns and control trainees and securing the first job, future studies could more rigorously evaluate changes in career preferences between pre and post internship with an analysis that considers the time that has elapsed between career interest noted pre-internship vs post internship career placement. “

      Appraisal:

      Overall the authors achieve their aims of describing outcomes of an internship program for graduate career development and offering lessons learned for other institutions seeking to create their own internship programs.

      We thank you for your thorough reading and review of the manuscript.

      Impact:

      The paper will be very useful for other institutions to dispel some of the concerns of research advisers about internships for PhD students (although not necessarily for postdoctoral fellows). In the long run, wider adoption of internships as part of PhD training will depend not only on faculty buy-in but also on the availability of resources and changes to the graduate school funding model so that such programs are not viewed as another "unfunded mandate" in graduate education. Perhaps the industry will be motivated to support internships by the positive outcomes for hosts reported in this paper. Additionally, NIH could allow a certain amount of F, T, or even RPG funds to be used to support internships for purposes of career development. 

      Thank you. We share your hope that the information and data resulting from this study will be valuable to other institutions. Your point about NIH (and other funders, for that matter) allowing trainees to participate in internship experiences while funded by the granting agency is an excellent one. We have found that communication with program officers often garners their support for the intern remaining on a fellowship or training grant during the internship. This allows the internship program to fund additional interns, especially those that are supported by the faculty advisor’s grants.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Two minor points about the comments used from focus groups.

      (i) In figure 5, there is a specific quote about being a reward that is used twice;

      (ii) It seems that there should be some consistency in how these quotes are relayed with respect to gender identification of the trainee. In some cases 's/he' is used, in others 'he' or 'she' is used, and in others 'they' is used.

      We appreciate this suggestion and agree that a non-gendered convention would clearer – accordingly, we have revised all quotes to use “they” to be more consistent. In addition, we have removed the duplicated quote from figure 5, which was originally inserted in two sections because of its applicability to both the “Persisting Challenges” and “Trainees’ abilities and skills were primary drivers of the success of the internship”.

      Reviewer #2 (Recommendations For The Authors):

      (1) The paper is somewhat lengthy. Some redundant material can be eliminated - Lines 366-371 simply restate the data in Table 5. Lines 393-396 restate the data in Figure 3. The text should be reserved for interpreting rather than restating the data in tables and figures.

      Thank you for this feedback and we agree that these sections can be condensed. We have removed some of the redundancy and retained enough for figures and text to each be stand alone for accessibility to the readers.

    1. eLife assessment

      Understanding how genomic regulatory elements control spatiotemporal gene expression is essential for explaining cell type diversification, function, and the impact of genetic variation on disease. This important study provides solid evidence that enhancers generally combine additively to influence gene expression. Moreover, promoters, particularly weaker ones, can exhibit supra-additivity when integrating enhancer effects. These findings highlight the context-dependent nature of enhancer-promoter interactions in gene regulation, and contribute to ongoing discussions about the selectivity and combination of regulatory elements.

    2. Reviewer #1 (Public review):

      This manuscript by Martinez-Ara et al investigates how combinations of cis-regulatory elements combine to influence gene expression. Using a clever iteration on massively parallel reporter assays (MPRAs), the authors measure the combinatorial effects of pairs of enhancers on specific promoters. Specifically, they assayed the activity of 59x59 different enhancer-enhancer (E-E) combinations on 8 different promoters in mouse embryonic stem cells. The main claims of the paper are that E-E pairs combine nearly additively, and that supra-additive E-E pairs are rare and often promoter-dependent. The data in this study do generally support these claims.

      This paper makes a good contribution to the ongoing discussions about the selectivity of gene regulatory elements. Recent works, such as those by Martinez-Ara et al. and Burgman et al., have indicated limited selectivity between E-P pairs on plasmid-based assays; this paper adds another layer to that by suggesting a similar lack of selectivity between E-E pairs.

      An interesting result in this manuscript is the observation that weak promoters allow more supra-additive E-E interactions than strong promoters (Figure 4b). This nonlinear promoter response to enhancers aligns with the model previously proposed in Hong et al. (from my own group), which posited that core promoter activities are nonlinearly scaled by the genomic environment, and that (similar to the trend observed in Figure 5b) the steepness of the scaling is negatively correlated with promoter strength.

      My only suggestion for the authors is that they include more plots showing how much the intrinsic strengths of the promoters and enhancers they are working with explain the trends in their data.

      Specific Suggestions<br /> Supplementary Figure 4 is presented as evidence for selectivity between single enhancers and promoters. Could the authors inspect the relationship between enhancer/promoter strength and this selectivity? Generating plots similar to Figure 4B and Figure 5B, but for single enhancers, should show if the ability of an enhancer to boost a promoter is inversely correlated to that promoter's intrinsic strength. Also, in Supplementary Figure 4, coloring each point by promoter type would clarify if certain promoters (the weak ones) consistently show higher boost indices across all enhancers. If they do not, the authors may want to speculate how single enhancers can show selectivity for promoters while the effect of adding a second enhancer to an existing E-P has little selectivity. An alternate explanation, based solely on the strength of the elements, would be that when the expression of a gene is low the addition of enhancer(s) have large effects, but when the expression of a gene is high (closer to saturation) the addition of enhancer(s) have small effects.

      Can anything more be said about the enhancers in E-E-P combinations that exhibit supra-additivity? Specifically, it would be interesting to know if certain enhancers, e.g. strong enhancers or enhancers with certain motifs, are more likely to show supra-additivity with a given promoter.

      Comments on revised version:

      The revised manuscript satisfactorily addresses the points I raised in the review. With the addition of the new graphs there is enough data for readers to decide whether the supra-additivity depends only on the strength of the promoter or on some other (undefined) feature of E-P pairs. This manuscript is a solid contribution to the ongoing debate about enhancer-promoter selectivity.

    3. Reviewer #2 (Public review):

      Summary

      This work investigates how multiple DNA elements combine to regulate gene expression. The authors use an episomal reporter assay which measures the transcriptional output of the reporter under the regulation of an enhancer-enhancer-promoter triple. The authors test all combinations of 8 promoters and 59 enhancers in this assay. There are two main findings: (1) enhancer pairs generally combine additively on reporter output (2) the extent to which enhancers increase reporter output over the promoter (individually and as enhancer-enhancer pairs) is inversely related to the intrinsic strength of the promoter. Both of these findings are interesting and are well supported by the data.

      This study extends previous results on enhancer-promoter combinations to enhancer-enhancer-promoter triples. For example the near equivalence of Fig. 5b and Fig. S7b is intriguing. This experimental design also provides the ability to investigate the notion of selectivity (also commonly referred to as compatibility) between enhancer-enhancer pairs and promoters.

      The authors note many limitations, including the selection of the elements and the size and spacing of the tested elements. Some of the enhancer-enhancer-promoter triples they test were also investigated by a different experimental design in Brosh et al 2023. Brosh et al observed non-additivity between these elements while this study did not. Ultimately we do not know which mechanisms produce the non-additivity that has been observed in native loci and which experimental designs would preserve such mechanisms.

      Overall this is a nice experimental design and a great dataset for probing how enhancers and promoters combine to regulate gene expression. I have no major concerns, but I will try to clarify some methodological points I found confusing.

      Methodology<br /> The following two comments are meant to help the reader understand the methodology/terminology used in this paper and how it relates to other similar studies.

      The interpretation that "promoters scale enhancer signals in a non-linear manner" is potentially confusing. I believe that the authors use "non-linear" to refer to the slopes (represented by the letter 'b' in Fig. 5b) being not equal to 1. Given how the boost index is defined, this implies the relationship

      Activity of EEP = (Activity of CCP) * (Average Linear Boost)^b

      One potential source of confusion is that the Average Linear Boost term itself depends on the set of promoters that are assayed. Averaging across (many) promoters may alleviate this concern, in which case Average Linear Boost may be considered some form of intrinsic enhancer strength. If so, there is a correspondence between this terminology and the terminology presented in Bergman et al 2022. If b not equal to 1 refers to a non-linear scaling, then the reader may think that b=1 refers to a linear scaling. But if b=1, and the Average Linear Boost term is interpreted as intrinsic enhancer strength, then the equation above implies that the activity of EEP is equal to an intrinsic promoter strength times an intrinsic enhancer strength. This is essentially the relationship that is considered in Bergman et al 2022 and which is referred to in that paper as 'multiplicative'. The purpose of this comment is not to argue for what is the relationship that best explains the data, it is just to clarify the terminology.

      Enhancer-promoter selectivity: As a follow-up to a previous study (Martinez-Ara et al, Molecular Cell 2022) the authors mention that the data in this study also shows that enhancers show selectivity for certain promoters. I found the methodology hard to follow, so this section of the review is meant to guide the reader in understanding how the authors define 'selectivity'. The authors consider an enhancer to be not selective if its 'boost index' is the same across a set of promoters. 'Boost index' is defined to be the ratio of the reporter output with the enhancer and promoter divided by the reporter output with just the promoter. Conceptually, I think that considering the boost index is a reasonable way to quantify selectivity. The authors use a frequentist approach to classify each enhancer as selective or not selective. The null hypothesis is that the boost index of the enhancer is equal across a set of promoters. This can be visualized in Fig. 2C where the null hypothesis is that the mean of each vertical distribution is equal. Note that in Figure S4b of this paper (and in Figure 4B of their 2022 paper) the within-group variance is not plotted. Statistical significance is assessed using a Welch F-test.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      We thank the reviewer for the positive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      This manuscript by Martinez-Ara et al investigates how combinations of cis-regulatory elements combine to influence gene expression. Using a clever iteration on massively parallel reporter assays (MPRAs), the authors measure the combinatorial effects of pairs of enhancers on specific promoters. Specifically, they assayed the activity of 59x59 different enhancer-enhancer (E-E) combinations on 8 different promoters in mouse embryonic stem cells. The main claims of the paper are that E-E pairs combine nearly additively, and that supra-additive E-E pairs are rare and often promoter-dependent. The data in this study generally support these claims.

      This paper makes a good contribution to the ongoing discussions about the selectivity of gene regulatory elements. Recent works, such as those by Martinez-Ara et al. and Burgman et al., have indicated limited selectivity between E-P pairs on plasmid-based assays; this paper adds another layer to that by suggesting a similar lack of selectivity between E-E pairs.

      An interesting result in this manuscript is the observation that weak promoters allow more supra-additive E-E interactions than strong promoters (Figure 4b). This nonlinear promoter response to enhancers aligns with the model previously proposed in Hong et al. (from my own group), which posited that core promoter activities are nonlinearly scaled by the genomic environment, and that (similar to the trend observed in Figure 5b) the steepness of the scaling is negatively correlated with promoter strength.

      We now discuss the parallel with the Hong 2022 study (Discussion, lines 307-310).

      My only suggestion for the authors is that they include more plots showing how much the intrinsic strengths of the promoters and enhancers they are working with explain the trends in their data.

      Agreed, see below.

      Specific Suggestions

      Supplementary Figure 4 is presented as evidence for selectivity between single enhancers and promoters. Could the authors inspect the relationship between enhancer/promoter strength and this selectivity? Generating plots similar to Figure 4B and Figure 5B, but for single enhancers, should show if the ability of an enhancer to boost a promoter is inversely correlated to that promoter's intrinsic strength...

      Thank you for the suggestion, we have now repeated the analysis of Figure 5 for EP pairs instead of EEP triplets, and included it as new Supplementary Figure S7. Despite the lower statistical power, the trends are very similar. 

      ...Also, in Supplementary Figure 4, coloring each point by promoter type would clarify if certain promoters (the weak ones) consistently show higher boost indices across all enhancers. If they do not, the authors may want to speculate how single enhancers can show selectivity for promoters while the effect of adding a second enhancer to an existing E-P has little selectivity. An alternate explanation, based solely on the strength of the elements, would be that when the expression of a gene is low the addition of enhancer(s) has large effects, but when the expression of a gene is high (closer to saturation) the addition of enhancer(s) have small effects.

      We now added colour coding for each of the promoters in figure S4. We agree this clarifies the contribution of each promoter to the selectivity of each enhancer and it further confirms the responsiveness trends observed in Figure 5.

      Can anything more be said about the enhancers in E-E-P combinations that exhibit supra-additivity? Specifically, it would be interesting to know if certain enhancers, e.g. strong enhancers or enhancers with certain motifs, are more likely to show supra-additivity with a given promoter.

      Unfortunately, even with the number of enhancers that we tested, we lack statistical power to identify sequence motifs that may favour supra-additivity.

      Reviewer #2 (Public Review):

      We thank the reviewer for the supportive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      Summary

      This work investigates how multiple regulatory elements combine to regulate gene expression. The authors use an episomal reporter assay which measures the transcriptional output of the reporter under the regulation of an enhancer-enhancer-promoter triple. The authors test all combinations of 8 promoters and 59 enhancers in this assay. The main finding is that enhancer pairs generally combine additively on reporter output. The authors also find that the extent to which enhancers increase reporter output is inversely related to the intrinsic strength of the promoter.

      This manuscript presents a compact experiment that investigates an important open question in gene regulation. The results and data will be of interest to researchers studying enhancers. Given that my expertise is in modeling and computation, I will take the experimental results at face value and focus my review on the interpretation of the results and the computational methodology. I find the result of additivity between enhancers to be well supported. The findings on differential responsiveness between promoters are very interesting but the interpretation of such responses as 'non-linear' or 'following a power-law' may be misleading. More broadly, I think a more rigorous description of the mathematical methodology would increase the clarity and accessibility of this manuscript. A major unanswered question is whether the findings in this study apply to enhancers in their native genomic context. Regardless, investigating such questions in an episomal reporter assay is valuable.

      Main comments

      Applicability to native genomic context: The applicability of the results in this paper to enhancers in their native genomic context is unclear. As the authors state in the discussion section, the reporter gene is not integrated into the genome, the spacing between enhancers does not match their native context etc. It is thus unclear whether this experimental design is able to detect the non-additivity between enhancers which is known to be present in the genome. This could be investigated by testing the enhancer-enhancer-promoter tuples for which non-additivity has been observed in the genome (references are given in the introduction) in this assay.

      We appreciate the suggestion, but we chose not to go back to the lab to generate additional data to address this point. Of the cited previous studies, two are comparable to our study because they also used mESCs and included loci that we also studied:  Thomas et al. (2021) and Brosh et al. (2023). We now discuss how the findings of these two studies relate to our observations in the Discussion, lines 336-345.

      Interpretation of promoter responses as non-linear and following a power-law: In Fig 5, the authors demonstrate that enhancer-enhancer pairs boost reporter output more for weak promoters as opposed to strong promoters. I agree the data supports this finding, but I find the interpretation of such data as promoters scaling enhancers according to a power-law (as stated in the abstract) to be misleading. As mentioned on line 297, it is not possible to define an intrinsic measure of enhancer strength, thus the authors assign the base of the power-law to be the average boost index of the enhancer-enhancer pair across the 8 promoters. But this measure incorporates some aspect of a promoter and is not solely a property of enhancers...

      We agree that the power-law conclusion in the abstract was too strong; we have rephrased it as "non-linear".

      ...It would also be useful to know whether the results in Fig 5 apply to only enhancer-enhancer-promoter triples or also to enhancer-promoter pairs.

      We have now added this EP analysis as new Supplemental Figure S7. Although the statistical power is much lower, this shows very similar trends as the EEP analysis. We briefly report this, lines 275-278.

      Enhancer-promoter selectivity: As a follow-up to a previous study (Martinez-Ara et al, Molecular Cell 2022) the authors mention that the data in this study also shows that enhancers show selectivity for certain promoters. The authors mention that both studies use the same statistical methodology and the data in this study is consistent with the data from the 2022 paper. However, I think the statistical methodology in both studies needs further exposition. This section of the review is thus meant to ensure that I understand the author's methodology, to guide the reader in understanding how the authors define 'selectivity', and to probe certain assumptions underlying the methodology.

      My understanding of the approach is as follows: The authors consider an enhancer to be not selective if its 'boost index' is the same across a set of promoters. 'Boost index' is defined to be the ratio of the reporter output with the enhancer and promoter divided by the reporter output with just the promoter. Conceptually, I think that considering the boost index is a reasonable way to quantify selectivity.

      The authors use a frequentist approach to classify each enhancer as selective or not selective. The null hypothesis is that the boost index of the enhancer is equal across a set of promoters. This can be visualized in Fig. 2C where the null hypothesis is that the mean of each vertical distribution is equal. Note that in Figure S4 of this paper (and in Figure 4B of their 2022 paper) the within-group variance is not plotted. Statistical significance is assessed using a Welch F-test. This is a parametric test that assumes that the observations within each vertical distribution in Fig 2C are normally distributed (this test does allow for heteroskedasticity - which means that the variance may differ within each vertical distribution). Does the normality assumption hold? This analysis should be reported. If this assumption does not hold, is the Welch test well calibrated?

      We have tested the normality of all of the single enhancer + promoter combinations that were tested using the welch F-test. 94.1% of the 439 single enhancers + Promoter combinations show normal distributions (at a 1% FDR). We have added this to the methods section of the revised manuscript. Apart from this, non-normality has little to no influence on the Welch F-test performance (https://rips-irsp.com/articles/10.5334/irsp.198). Therefore, the use of the Welch F-test to score enhancer selectivity on these data is valid. Apart from this, we agree that a simple binary classification of selective vs non-selective is not descriptive enough for these kinds of data. We addressed this in our previous publication by exploring the relationship between selectivity and enhancer strength. However, in the objective in this publication was solely to show that this new dataset follows similar selectivity patterns to our previous publication. Furthermore, our analysis on the non-linearity of promoter response is a more quantitative continuation on the analysis on selectivity as this is probably one of the major contributors to enhancer selectivity. This was probably present in our previous paper but could not be analyzed as there were less combinations per promoter.

      For further clarity, we have now highlighted the individual promoters in Figure S4 by colors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I found this to be an interesting manuscript and am glad this experiment was conducted. As I wrote in my public review, I think that clarifying the computational methods/ideas would really help. I also think it would be helpful to properly define the terms that are being used. For example, this manuscript uses the terminology cooperativity and synergy. Are these meant to be synonymous with supra-addivity?

      Thank you for this point. The revised manuscript no longer uses the word “cooperativity”. We now use “supra-additivity” when describing our data, and “synergy” as biological interpretation. In the Introduction we now clarify this distinction.

      Comments on enhancer selectivity:

      In the public review, I have given comments on the statistical methodology employed to assess enhancer selectivity. On a more subjective note, I'm not convinced that a frequentist approach to a binary classification of 'selective' vs 'not selective' is that useful here. I think it would be more useful to report an 'effect size' of the extent to which an enhancer is selective and to study the sources of this effect size. I think you've tried to do this in lines 329-339 of the discussion but I think the exposition could be clearer.

      Figure S4B may suggest how to do this. It appears that the distribution of boost indices for a given enhancer is trimodal (this is most obvious for the stronger enhancers on the top of the plot). Is it the case that each mode (for each enhancer) consists of the same set of promoters? I think what is implied by Figure 5B is that the stronger promoters are not boosted as much as the weaker promoters. So does the leftmost mode consist of Ap1m1, the middle mode consist of Klf2/Otx2/Nanog, and the rightmost mode of Sox2/Fgf5/Lefty1/Tbx3? If so, I would recommend emphasizing this in the text/figure and clarifying how this relates to selectivity. It seems that the chain of logic is as follows: (1) We define an enhancer to be selective if its boost indices across a set of promoters are not the same. (2) We generally observe that stronger promoters get boosted less than weaker promoters. (3) Thus selectivity arises due to differences in intrinsic strengths of the promoter. I think this is what is being implied in lines 329-339 of the discussion, but it took me multiple readings to understand this and I'm not convinced the power-law explanation is justified (see public review).

      We have modified this paragraph of the Discussion (now lines 350-359).

      Regarding the power-law: in the Results we state “roughly a power-law function”. We have removed the power-law claim from the abstract, that conclusion as phrased was indeed too firm.

      Reference to Zuin et al

      Lines 323 - 325: A reference is made to the data from Zuin et al "following approximately a power-law". What data in Zuin et al does this statement refer to? I do not believe the authors in Zuin et al claim that the relationship between GFP intensity and enhancer-promoter distance (Figure 1h,i from Zuin et al) follows a power law. It is certainly non-linear, but I have taken a look at this data myself and do not find it follows a power-law. Please either explain this further and rigorously justify the claim or adjust the wording accordingly.

      Good point, in the discussion of Zuin et al we have replaced “power law” with “non-linear decay function”

    1. eLife assessment

      This is a valuable contribution to our understanding of how different cell stressors (ethanol or heat-shock) elicit unique responses at the genomic and topographical level under the regulation of yeast transcription factor Hsf1, providing solid evidence documenting the temporal coupling (or lack thereof) between Hsf1 aggregation and long-range communication among co-regulated heat-shock loci versus chromatin remodeling and transcriptional activation. A particular strength is the combination of genomic and imaging-based experimental approaches applied to genetically engineered in vivo systems.

    2. Reviewer #2 (Public Review):

      Rubio et al. study the behavior of the transcription factor Hsf1 under ethanol stress, examining its distribution within the nucleus and the coalescence of heat shock response genes in budding yeast. In comparison to the heat shock response, the response to ethanol stress shows similar gene coalescence and Hsf1 binding. However, there is a notable delay in the transcriptional response to ethanol, and a disconnect between it and the appearance of irreversible Hsf1 condensates/puncta, highlighting important differences in how Hsf1 responds to these two related but distinct environmental stresses.

      The authors have addressed the majority of my previous comments effectively. The Sis1 experiment provides a clear illustration of a distinctive response to ethanol and heat. This work offers a comprehensive perspective on Hsf1 in stress response from multiple angles.

    3. Reviewer #3 (Public Review):

      This is an interesting manuscript that builds off of this group's previous work focused on the interface between Hsf1, heat shock protein (HSP) mRNA production, and 3D genome topology. Here the group subjects the yeast Saccharomyces cerevisiae to either heat stress (HS) or ethanol stress (ES) and examines Hsf1 and Pol II chromatin binding, Histone occupancy, Hsf1 condensates, HSP gene coalescence (by 3C and live cell imaging), and HSP mRNA expression (by RT-qPCR and live cell imaging). The manuscript is well written, and the experiments seem well done, and generally rigorous, with orthogonal approaches performed to support conclusions. The main findings are that both HS and ES result in Hsf1/Pol II-dependent intergenic interactions, along with formation of Hsf1 condensates. Yet, while HS results in rapid and strong induction of HSP gene expression and Hsf1 condensate resolution, ES result in slow and weak induction of HSP gene expression without Hsf1 condensate resolution. Thus, the conclusion is somewhat phenomenological - that the same transcription factor can drive distinct transcription, topologic, and phase-separation behavior in response to different types of stress.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      The authors have addressed the majority of my comments effectively. The new Sis1 experiment provides a clear illustration of a distinctive response to ethanol and heat. This work offers a comprehensive perspective on Hsf1 in stress response from multiple angles. I have two additional comments to improve the paper without re-review:

      (Original point #3) Could the authors clarify the differences between DPY1561 and the original strain used? There appears to be missing statistical analysis for Figure 1E at the bottom.

      DPY1561 is a haploid version of the original heterozygous diploid strain (LRY033). We opted for this strain in the analysis depicted in Figures 1D and 1E since 100% of Hsp104 is BFP-tagged; thus, the signal above background is stronger and the scoring of Hsp104 foci cleaner. The statistical analysis (Mann Whitney test) for the lower graphs in Fig. 1E has been added. We thank the reviewer for pointing this out.

      (Original point #4) In the new Figure 7F, '% transcription' and '% coalescence' are presented. My understanding is that Figures 7D and 7E aim to demonstrate the correlation between HSP104 transcription (a continuous variable) and HSP104-HSP12 coalescence (a binary variable) at the single-cell level. However, averaging the data across cells masks individual variations and potential anti-correlations. The authors could explore statistical methods that handle correlations between a continuous variable and a binary variable. Alternatively, consider converting 'HSP104 transcription' to a binary variable and then performing a chi-square test to assess the association.

      We thank the reviewer for this suggestion. In response, we have made the following changes:

      (1)  Clarified that the data used in this analysis were derived from Fig. 7 – figure supplement 1 in which ‘HSP104 transcription’ was converted to a binary variable.

      (2)  Indicated that the theoretical ceiling for coalescence of these tagged alleles is 25% given their heterozygous state (Figure 7–figure supplement 1D legend).  In the other 75% of cells scored, HSP104-HSP12 coalescence might also be taking place but is not detectable using this strategy. Therefore, it is not possible to elucidate any anti-correlation between HSR transcription and HSR coalescence in this experiment.

      In addition, we attempted to buttress the argument suggested by the Pearson correlation coefficient analysis (Fig. 7F) that a stronger association exists between transcription and gene coalescence in heat-shocked (HS) vs. ethanol stressed (ES) cells. To do so, we used the chi-square test as suggested by the reviewer. However, the results of this test were ambiguous, and we therefore did not include it in the manuscript.

    1. eLife assessment

      This manuscript investigates how chloroplasts are broken down during light-limiting conditions as plants reorganize their energy-producing organelles during carbon limitation. The authors provide compelling live-cell imaging data of plastids and solid quantification of events, documenting that buds form on the surface of chloroplasts and pinch away, then associate with the vacuole via a mechanism that depends on autophagy machinery, but not plastid division machinery. This manuscript provides valuable groundwork for other scientists studying the regulation and breakdown of energy-producing organelles, including chloroplasts and mitochondria.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths:

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow.

      Weaknesses:

      The study adds more valuable descriptive information about the previously published phenomenon of RCB formation under carbon starvation but does not reveal the putative mechanisms governing formation of RCBs and their release to the vacuole.

      Comments on revised version:

      The authors have done an impressive job revising the manuscript and addressed my comments. The authors clarified previous ambiguities and the new version of the manuscript greatly benefits from the provided quantifications and adjusted discussion.

    3. Reviewer #2 (Public review):

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b.

      Overall, the findings are interesting, and in general, the experiments are very well executed.

      Comments on revised version:

      The authors have generally addressed all of my concerns (and the other reviewer's) and adapted the manuscript where necessary. The revised version has significantly improved the manuscript. From my perspective there are no further concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions.

      The authors present very nice time lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with discussion of an internally-consistent model that summarizes the results.

      Strengths:

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution timelapse imaging to track chloroplast dynamics under light limiting conditions.

      Weaknesses:

      The main weakness of the manuscript is the limited quantitative data. While it can be challenging to quantify dynamic intracellular events, quantification of these processes is important to appreciate the significance of these findings.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths: 

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow. 

      Weaknesses: 

      A significant portion of the results presented in the study comes across as a corroboration of the previous findings made under different stress conditions: autophagy-dependent formation of RCBs was reported by Ishida et all in 2009. Furthermore, some included results are not of particular relevance to the study's aim. For example, it is unclear what is the importance of the role of SA in the formation of stromules, which do not serve as an origin for the RCBs. Similarly, the significance of the transient engulfment of RCBs by the tonoplast remained elusive. Although it is indeed a curious observation, previously reported for peroxisomes, its presentation should include an adequate discussion maybe suggesting the involved mechanism. Finally, some conclusions are not fully supported by the data: the suggested timing of events poorly aligns between and even within experiments mostly due to high variation and low number of replicates. Most importantly, the discussion does not place the findings of this study into the context of current knowledge on chlorophagy and does not propose the significance of the piece-meal vs complete organelle sequestration into the vacuole under used conditions, and does not dwell on the early localization of ATG8 to the future budding place on the chloroplast. 

      We performed additional experiments with biological replicates that involved quantification. The results of these experiments validate the findings of this study. We also revised the Discussion section, which now includes a discussion of the interplay between piecemeal-type and entire-organelle-type chloroplast autophagy and the relevance of autophagy adaptor and receptor proteins to the localization of ATG8 on the chloroplast surface. Accordingly, the first subheading section in the Discussion became too long. Therefore, we divided it into two subheading sections. We believe that the revisions successfully address the weaknesses pointed out by the reviewer and enhance the importance of the current study. Below is a detailed description of the improvements made to our manuscript in response to the reviewer comments.

      Reviewer #1 (Recommendations For The Authors): 

      It would be great if the authors kindly used numbered lines to facilitate the review process. 

      We have added line numbers to the text of the revised version of the manuscript.  

      The authors use the words "budding", "protrusion" and "stromule formation" interchangeably in some parts of the text. For the sake of clarity, it would be best to be consistent in the terminology and possibly elaborate on the exact differences between these structure types and the criteria by which they were identified. 

      We have checked all of the text and improved the consistency of the terminology. An important finding of this study is that chloroplasts form budding structures at the site associated with ATG8. These structures then divide to become a type of autophagic cargo termed a Rubiscocontaining body. We therefore mainly use the terms “bud” and “budding” throughout the text. In the experiments shown in Figure 5, we considered the possibility that chloroplast protrusions accumulate in leaves of atg mutants and do not divide because the mutants cannot create autophagosomes. Therefore, the word “protrusion” was used to describe the results shown in Figure 5 in which the proportion of chloroplasts forming protrusions was scored. In the revised text, the word “protrusion” is only used in descriptions of Figure 5. Previous reports define stromules as thin, tubular, extended structures (less than 1 µm in diameter) of the plastid stroma (Hanson and Sattarzadeh, 2011; Brunkard et al., 2015). In the revised text, the word “stromules” is used to describe the structures defined in these previous reports. We have added definitions of each term to the Introduction, Methods and Results sections where appropriate (lines 57–58, 160–162, 247–249, 313–316, 655–658, 668–670).      

      Pages 3-4: the authors observed budding of the chloroplasts within a few minutes - it would be helpful to specify that time was probably counted from the first observation of budding, not from the start of the dark treatment, and also specify the exact treatment duration for each of the experiments. 

      The time scales in the figures do not represent the time from the start of the dark treatment. Instead, they describe the duration from the start of the time-lapse videos that were used to generate the still images. Therefore, the indicated time scales are almost the same as the duration from the start of the observations of each target structure (chloroplast buds or GFPATG8a-labeled structures). As described in the Methods section, leaves were incubated in darkness for 5 to 24 h to induce sugar starvation. Such sugar-starved leaves were subjected to live-cell monitoring for the target structures. Since Arabidopsis leaves accumulate starch as a stored sugar source (Smith and Stitt, 2007; Usadel et al., 2008), dark treatment lasting several minutes is not sufficient for the starch to be consumed and sugar starvation to be induced.   To avoid confusion, we have added definitions of the time scales to the legends of figures containing the results of time-lapse imaging. We have also specified the durations of dark treatments used to obtain the respective results in the legends. 

      Figure 6: the time scale for complete autophagosome formation is in the range of 100-120 sec, how do these results align with the results shown in Figures 3B and C, where complete autophagosomes are suggested to be released into the vacuole after 73.8 sec. Furthermore, another structure is suggested to be formed within 50 sec. Such experiments possibly require a large number of replicates to estimate representative timing. 

      As mentioned in the previous response, the time scales in still frames represent the duration from the start of the corresponding video. Leaves incubated in darkness for 5 to 24 h were subjected to live-cell imaging. When we identified the target structures, e.g., GFP-ATG8alabeled structures on the surfaces of chloroplasts (Figure 6) or chloroplast budding structures (Figure 3), we began to track these structures. Therefore, the time scales in the figures do not align to a common time axis. We revised the descriptions about Figure 3 and Figure 6 in the Results section to clearly explain that the time points in each experiment merely indicates the time of one observation.

      The authors might want to consider using arrows to indicate structures of interest in all movies and figures.

      We have added arrows to indicate the structures of interest in the starting frames of all videos. We hesitate to add arrows to highlight RCBs accumulating in the vacuole (Figure 1-figure supplement 1, Figure 5 and Figure 8) and stromules (Figure 7) because many arrows would be required, which would obscure large portions of the images. We believe that the images without arrows clearly represent the appearance of RCBs or stromules and that their quantification (Figure 1-figure supplement 1C, Figure 5B, Figure 5-figure supplement 1B, Figure 7B, 7D, 7F, and Figure 8B) well supports the results.   

      Figure 7 Supplement 1: do the authors detect complete chloroplasts in the vacuole of atg7 and sid2/atg7? 

      We did not observe the vacuolar transport of whole chloroplasts in atg7 or atg7 sid2 plants under our experimental conditions. The figure below (Figure 1 for Response to reviewers) shows images of mesophyll cells from a leaf (third rosette leaf of a 20-d-old plant) of atg7 accumulating chloroplast stroma–targeted GFP (CT-GFP); this is from the previous version of Figure 7–figure supplement 1. Indeed, some GFP bodies exhibiting strong stromal GFP (CTGFP) signals appeared in the central area of the cell (arrowheads in A). However, such bodies were chloroplasts in epidermal cells. The 3D images (B) and cross-section image (x to z axis) of the region highlighted by the blue dotted line (C) indicate that such GFP bodies are the edges of chloroplasts that localize on the abaxial side of the observed region. Because CT-GFP expression was driven by the 35S promoter, strong GFP signals appeared in chloroplasts in epidermal cells in addition to chloroplasts in mesophyll cells. Previous studies using the same transgenic lines also showed that chloroplasts in epidermal cells exhibit strong GFP signals (Kohler et al., 1997; Caplan et al., 2015; Lee et al., 2023). RBCS-mRFP or GFP driven by the RBCS2B promoter do not label the chloroplasts in epidermal cells (new Figure 7-figure supplement 1). Additionally, because the borders between the mesophyll cell layer and the epidermal cell layer are not even, chloroplasts in epidermal cells are sometimes visible during observations of mesophyll cells. Such detection more frequently occurs during the acquisition of z-stack images. This point was more precisely demonstrated in our previous study with the aid of Calcofluor white staining of cell walls (Nakamura et al., 2018). Please see Supplemental Figure S3 in our previous report. To avoid any misunderstanding, we replaced the image of the leaf from atg7 in the revised figure, which is now Figure 7-figure supplement 2, with an image of another region to more precisely visualize mesophyll cells in this plant line.

      Author response image 1.

      Mesophyll cells in a leaf of atg7 accumulating stromal CT-GFP, reconstructed from the data shown in the previous version of Figure 7–figure supplement 1. (A) Individual channel images (CT-GFP and chlorophyll) from the merged orthogonal projection image shown in the previous version of Figure 7–figure supplement 1. The right panel shows the enhanced chlorophyll signal to clearly visualize the chloroplasts in epidermal cells. Green, CTGFP; magenta, chlorophyll fluorescence. Scale bar, 20 µm. (B) 3D structure of the merged image shown in (A). (C) Images of the cross section indicated by the blue dotted line (a to b) in B. Arrowheads indicate the edges of chloroplasts in epidermal cells.

      Figure 8: it would be interesting to hear the authors' opinion on why they observed a significant increase in RCBs number in the drp5b mutant background

      We have added a discussion of this issue to the revised manuscript (lines 445–459). We now have two hypotheses to explain this issue. One hypothesis is that the impaired chloroplast division due to the drp5b mutation reduces energy availability and thus activates chloroplast autophagy. The other hypothesis is that the drp5b mutation impairs the type of chlorophagy that degrades whole chloroplasts, and thus piecemeal-type chloroplast autophagy via Rubiscocontaining bodies is activated. However, we do not have any experimental evidence supporting either hypothesis.  

      Reviewer #2 (Public Review): 

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b. 

      Overall, the findings are interesting, and in general, the experiments are very well executed. Although the mechanism of how Rubisco-containing bodies are processed is still unclear, this study suggests that a novel chloroplast division machinery exists to facilitate chloroplast autophagy, which will be valuable to investigate in the future. 

      Reviewer #2 (Recommendations For The Authors): 

      Below are some specific comments. 

      (1) In Supplement Figure 1B, there is no chloroplast stromule in RBCS-mRFP x atg7-2 plants under dark treatment with ConA, but in Figure 7A, there are stromules in CT-GFP x atg7-2 plants. How to explain such a discrepancy? Did the authors check the chloroplast morphology of RBCS-mRFP x atg7-2 plants in different developmental stages? Will it behave the same as CT-GFP x atg7-2 under the same condition as in Figure 7A?

      As described in the text, the ages and conditions of the leaves shown in Figure 1–figure supplement 1 and Figure 7 are different. In Figure 1–figure supplement 1, second rosette leaves from 21-d-old plants were incubated in the dark with concanamycin A for 1 d. In Figure 7E and 7F, we explored the condition under which mesophyll chloroplasts in atg leaves actively form stromules to assess how a deficiency in autophagy is related to stromule formation. We found that late senescing leaves (third rosette leaves from 36-d-old plants) of atg5 and atg7 plants accumulated many stromules without additional treatment (Figure 7). It is not surprising that the chloroplast morphologies shown in Figures 1 and 7 are different because the leaf ages and conditions are largely different.

      However, we agree that the differences in chloroplast stroma–targeted GFP and RBCS-mRFP might influence the visualization of stromules. For instance, fluorescent protein– labeled RBCS proteins are incorporated into the Rubisco holoenzyme, comprising eight RBCS and eight RBCL proteins (Ishida et al., 2008; Ono et al., 2013). Such a large protein complex might not accumulate in stromules. Therefore, we examined the chloroplast morphology in late senescing leaves (third rosette leaves from 36-d-old plants) from WT, atg5, and atg7 plants harboring ProRBCS:RBCS-mRFP, as you suggested. Mesophyll chloroplasts formed many stromules in atg5 and atg7 leaves but not in WT leaves (Figure 7–figure supplement 1). These results indicate that RBCS-mRFP can be used to visualize stromules and that the differences in chloroplast morphology between Figure 1-figure supplement 1 and Figure 7 cannot be attributed to the different marker proteins used. A previous study also indicated that Rubisco is present in plastid stromules (Kwok and Hanson, 2004).

      (2) In Figure 2, the author showed that the outer envelope marker Toc64 was colocalized with chloroplast buds. How about proteins in the inner envelope membrane of chloroplasts? 

      We generated Arabidopsis plants expressing red fluorescent protein–tagged K+ EFFLUX ANTIPORTER 1 (KEA1), a chloroplast inner envelope membrane protein (Kunz et al., 2014; Boelter et al., 2020). We found that the chloroplast buds visualized by RBCS-GFP were also marked by KEA1-mRFP (Figure 2–figure supplement 1B). We observed the transport of such buds (Figure 2–figure supplement 2). These results strengthen our claim that autophagy degrades chloroplast stroma and envelope components as a type of specific cargo termed a Rubisco-containing body. The descriptions about this additional experiment are in lines 181– 187. 

      (3) In Figure 3, how many RCBs were tracked for the trafficking analysis to raise the conclusion that the vesicle was released into the vacuole around 73.8s? 

      We apologize for our confusing explanation in the previous version of the manuscript. The time point “73.8 s” merely indicates the time of one observation, as shown in Figure 3. This time does not represent the common timing of vacuolar release of a Rubisco-containing body. As we explained in the response to the comments from reviewer 1, we subjected leaves that were incubated in the dark for several hours to live-cell imaging assays to observe chloroplast morphology in sugar-starved leaves. The time scales of each still frame represent the time from the start of the corresponding video. Therefore, the time points in the respective figures do not align to a common time axis, and the number “73.8 s” is not important. We attempted to emphasize that the type of movement of Rubisco-containing bodies changes during their tracking shown in Figure 3. Based on this finding, we hypothesized that the Rubisco-containing bodies are released into the vacuolar lumen when they initiate random movement. Therefore, we expected that the interaction between the Rubisco-containing bodies and the vacuolar membrane could be captured, and we therefore turned our attention to the dynamics of the vacuolar membrane in subsequent experiments. Accordingly, our observations of the vacuolar membrane allowed us to visualize the release of the Rubisco-containing body into the vacuole (Figure 4). We rephrased these sentences (lines 212–219) to avoid confusion and to explain this idea accurately. We also performed tracking experiments of Rubisco-containing bodies to strengthen the finding that the type of movement of the bodies changes during tracking (Figure 3-figure supplement 1, Videos 8 and 9).

      (4) I do believe the conclusion that vacuolar membranes incorporate RCBs into the vacuole in Figure 4. However, it will be more convincing if images of higher quality are provided. 

      We tried to acquire images that more clearly show the morphology of the vacuolar membrane during the incorporation of the Rubisco-containing body. We obtained the images in Figure 4A using a standard type of confocal microscope, the LSM 800 (Carl Zeiss), and obtained the images in Figure 4B using the Airyscan Fast acquisition mode, a hyper-resolution microscope mode, in the LSM 880 system (Carl Zeiss). We performed additional experiments with another type of confocal microscope, the SP8 (Leica; Figure 4-figure supplement 1A to 1C, Videos 12– 14). The quality of the images from these experiments was as high as possible under the experimental conditions (equipment and plant materials). In general, increasing the image resolution during time-lapse imaging with a confocal microscope requires reducing the time resolution. However, the transport of a Rubisco-containing body occurs relatively quickly: Its engulfment by the vacuolar membrane takes place for just a few seconds (Figure 4, Figure 4figure supplement 1). We could therefore not reduce the time resolution further to better capture the morphology of the vacuolar membrane.

      (5) In Figure 7G, the authors concluded that SA and ROS might be the cause of the extensive formation of stromules. How about the H2O2 level in NahG and atg5 NahG plants? Compared with sid2, NahG appeared to completely inhibit stromule formation in atg5. Will this be related to ROS levels?

      We measured the hydrogen peroxide (H2O2) contents in NahG atg5 plants and atg5 single mutant plants and found that their leaves accumulate more H2O2 than those of wild-type or NahG plants (Figure 7-figure supplement 3). Since we have only maintained fresh seeds of NahG atg5 plants harboring the 35S promoter–driven chloroplast stroma–targeted GFP (Pro35S:CT-GFP) construct, we first confirmed that CT-GFP accumulation does not affect the measurement of H2O2 content. H2O2 levels were similar between wild-type leaves and CT-GFPexpressing leaves. A comparison among Pro35S:CT-GFP expressing lines in the wild-type, atg5, NahG, and NahG atg5 backgrounds revealed enhanced accumulation of H2O2 in the atg5 and NahG atg5 genotypes compared with the wild-type and NahG genotypes. This finding is consistent with the results of histological staining of H2O2 using 3,3′-diaminobenzidine (DAB) in a previous study (Yoshimoto et al., 2009).   

      It is unclear why NahG expression inhibited stromule formation more strongly than the sid2 mutation in the atg5 mutant background, as you pointed out (Figure 7A–D). NahG catabolizes salicylic acid (SA), whereas sid2 mutants are knockout mutants of ISOCHORISMATE SYNTHASE1 (ICS1), a gene required for SA biosynthesis. Plants have two metabolic routes for SA biosynthesis: The isochorismate synthase (ICS) pathway and the phenylalanine ammonia-lyase (PAL) pathway. Furthermore, Arabidopsis plants contain two ICS homologs: ICS1 and ICS2. Previous studies have revealed that ICS1 (SID2) is the main player for SA biosynthesis in response to pathogen infection (Delaney et al., 1994). Another study revealed drastically lower SA contents in the leaves of both sid2 single mutants and NahGexpressing plants compared with those of wild-type plants (Abreu and Munné-Bosch, 2009). Therefore, it is clear that the sid2 single mutation sufficiently inhibits SA accumulation in Arabidopsis leaves. However, low levels of SA biosynthesis through ICS1-independent routes might influence stromule formation in leaves of sid2 atg5 and sid2 atg7. Because a previous study demonstrated that the sid2 single mutation sufficiently suppresses the SA hyperaccumulation–related phenotypes of atg plants (Yoshimoto et al., 2009), we believe that the use of the sid2 mutation was adequate to assess the effects of SA on stromule formation that actively occurs in the atg plants examined in this study.    

      (6) In Supplement Figure 7, I have noticed that there are still some CT-GFP signals (green dots) in the vacuoles of the atg7 mutant, are they RCBs? If so, how can this phenomenon be explained? 

      As we explained in the response to the comment from Reviewer 1, CT-GFP-labeled bodies are chloroplasts in the epidermal cell layer. Please see our response to Reviewer 1’s comment about Figure 7 and the associated figure (Figure 1 for Response to reviewers). The CT-GFP-labeled dots (arrowheads) are the edges of chloroplasts and localize on the abaxial side of the observed region. The dots have faint chlorophyll signals. This phenomenon is much more clear in the image with enhanced brightness (right panel in A). Since the bodies are merely the edges of epidermal chloroplasts, their chlorophyl signals are faint. Therefore, these bodies are not Rubisco-containing bodies but are instead simply the edges of chloroplasts in the epidermal cell layer. 

      (7) On page 24, the second paragraph, lines 12-14, the authors claim that no receptors similar to those involved in mitophagy that bind to LC3 (ATG8) have been established in chloroplasts. Actually, it has been reported that a homologue of mitophagy receptor, NBR1, acts as an autophagy receptor to regulate chloroplast protein degradation (Lee et al, 2023, Elife; Wan et al, 2023, EMBO Journal). Although I do think NBR1 is not involved in RCBs based on these reports, these findings should be discussed here. 

      Thank you for this good suggestion. We have added a discussion about this important point to the Discussion section, along with the relevant citations (lines 482–502).

      (8) In the figure legend, the details of the experiments will be better provided, such as leaves stages (Figure 1, Figure 5...), the number of chloroplasts analyzed (Figure 7...). This can help the readers to follow. 

      Thank you for highlighting this. We have checked all of the figure legends and added descriptions of the leaf stages and experimental conditions.  

      Reviewer #3 (Public Review):

      Summary: 

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions. 

      The authors present very nice time-lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with a discussion of an internally-consistent model that summarizes the results. 

      Strengths: 

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution time-lapse imaging to track chloroplast dynamics under light-limiting conditions. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of quantitative data. Quantification of multiple events is required to support the authors' claims, for example, claims about which parts of the plastid bud, about the dynamics of the events, about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association. Without understanding how often these events occur and how frequently events follow the manner observed by the authors (in the 1 or 2 examples presented in each figure) it is difficult to appreciate the significance of these findings. 

      We have performed several additional experiments, including the quantification of multiple chloroplast buds or GFP-ATG8-labeled structures from individual plants. The results strengthen our claims and thus improve the significance of the current study. Please see the responses below for details.

      Reviewer #3 (Recommendations For The Authors):

      Overall, the live-cell imaging in this paper is high quality and rigorously conducted. However, without quantification of these events, it is difficult to judge whether this is an occasional contributor to plastid breakdown, or the primary mechanism for this process. 

      - For Figure 1, the authors could estimate the importance of this mechanism for chloroplast breakdown by calculating the volume change in chloroplasts over time during light-limiting conditions, then comparing this to the volume of the puncta that bud off of plastids and the frequency of these events. That is, what percentage of chloroplast volume loss can be accounted for by puncta that bud from chloroplasts? Are there likely other mechanisms contributing to chloroplast breakdown, or is this the primary mechanism? 

      We measured the volumes of chloroplast stroma when the leaves from wild-type (WT) and atg7 plants accumulating RBCS-mRFP were subjected to extended darkness for 1 d (Figure 1-figure supplement 2). The volume of the chloroplast stroma in dark-treated leaves of WT plants was 70% that in leaves before treatment, whereas the volume of the chloroplast stroma in darktreated atg7 leaves was 86% that in leaves before treatment. The transport of Rubiscocontaining bodies into the vacuole did not occur in atg7 leaves (Figure 1-figure supplement 1). These results suggest that the release of chloroplast buds as Rubisco-containing bodies contributes to the decrease in chloroplast stroma volume during dark treatment. These results also suggest that autophagy-independent systems contribute to the decrease in chloroplast volume. It is difficult to monitor the volume or frequency of budding off of puncta from chloroplasts during dark treatment because the budding and transport of the puncta occur relatively quickly and are completed within minutes, and the puncta frequently move away from the plane of focus. Additionally, continuous monitoring of chloroplast morphology over the dark treatment period requires the long-term exposure of leaves to repeated laser excitation, and such treatment might cause unexpected stress. We believe that the evaluation of chloroplast stroma volume after 1 d of dark treatment is important for estimating the contribution of the mechanism described in this study. The descriptions about this additional experiment are in lines 163–174. 

      - The claim that structures budding from the plastid "specifically contains stroma material...without any chlorophyll signal" (p. 6 and Figure 2) should be supported by quantitative analysis of many such buds in multiple cells from multiple independent plants. 

      We performed additional experiments (Figure 2-figure supplement 1) to measure the fluorescence intensity ratios of the stroma marker RBCS-GFP and chlorophyll between chloroplast budding structures and their neighboring chloroplasts in Arabidopsis plants expressing the stromal marker RBCS-GFP along with TOC64-mRFP (a chloroplast outer envelope membrane protein), KEA1-mRFP (a chloroplast inner envelope membrane protein), or ATPC1-tagRFP (a thylakoid membrane protein). The results indicated that chloroplast buds contain chloroplast stroma without chlorophyll signals. The descriptions of this experiment are in lines 175–199. In these experiments, we observed 30 to 33 chloroplast buds from eight individual plants.  

      - Claims about the dynamics of these events in Figures 2 & 3 should be supported by quantitative analysis of many buds in multiple cells from multiple independent plants and appropriate summary statistics (e.g. mean, standard deviation), and claims about the coordination of events should be supported by statistical comparison of these measurements between different markers. 

      As mentioned in the response to the above comments, quantification of fluorescent intensities (Figure 2-figure supplement 1) revealed that the chloroplast budding structures produced TOC64-mRFP and KEA1-mRFP signals without ATPC1-tagRFP signal. These results support the claim that chloroplast buds contain chloroplast stroma and envelope components without thylakoid membranes. 

      It is not easy to quantify the dynamics of chloroplast buds since the puncta sometimes move away from the plane of focus. We therefore added data from individual time-lapse observations showing that the type of movement exhibited by the puncta changes during tracking (Figure 3-figure supplement 1A and 1B, Videos 8 and 9) to strengthen the notion that such a phenomenon was observed repeatedly. 

      - Data in Figure 4 should be supported by quantification of the proportion of plastid-derived puncta that end up inside the vacuole (compared to those that do not) in multiple cells from multiple independent plants. 

      Although we performed additional observations of the destinations of chloroplast-derived puncta, we encountered some difficulty in correctly calculating the proportion of plastid-derived puncta that ended up inside the vacuole. This problem is similar to the difficulty in tracking Rubisco-containing bodies mentioned in the response to the previous comments. During timelapse imaging, puncta sometimes move from the plane of focus toward the deeper side (abaxial side) or near side (adaxial side), causing us to lose track of a number of puncta. Therefore, we could not determine the destinations of all puncta to calculate the proportion of puncta that ended up in the vacuolar lumen.

      Alternatively, we added the results of three experiments (Figure 4-figure supplement 1, Videos 12–14) examining how the vacuolar membrane engulfs the chloroplast-derived puncta to incorporate them inside the vacuole. The data support the notion that such a phenomenon occurs repeatedly in sugar-starved leaves. All results were obtained from individual plants. 

      - Data in Figure 6 should also be supported by quantitative analysis of many buds in multiple cells from multiple independent plants, to determine whether ATG8 associates with all RBCScontaining buds, and vice versa. 

      To address this issue, we performed additional experiments on plants expressing GFP-ATG8a and RBCS-mRFP (Figure 6-figure supplements 3 and 4). First, we observed 58 chloroplast buds from eight individual plants and evaluated the proportion of GFP-ATG8a-labeled chloroplast buds. We determined that 64% of chloroplast buds were at least autophagy-associated structures (Figure 6-figure supplement 3A–3C). This result also suggests that chloroplasts can form autophagy-independent budding structures, which might be associated with stromule-related structures or the autophagy-independent vesiculation machinery. We also evaluated the number of GFP-ATG8a-labeled chloroplast buds (Figure 6-figure supplement 3D and 3E). The formation of such structures increased in response to dark treatment (Figure 6-figure supplement 3D), but they did not appear in atg7 plants exposed to the dark (Figure 6-figure supplement 3E). These results support the notion that the formation of chloroplast buds to be released as Rubisco-containing bodies requires the core ATG machinery. 

      Furthermore, we observed 157 GFP-ATG8a-labeled structures from thirteen individual plants and evaluated the proportion of chloroplast-associated isolation membranes (Figure 6-figure supplement 4). We also classified the chloroplast-associated, GFP-ATG8alabeled structures into two categories: the chloroplast surface type (Figure 7-figure supplement 4A) and the chloroplast bud type (Figure 7-figure supplement 4B). This experiment suggested that 43% of the isolation membranes labeled by GFP-ATG8a were involved in chloroplast degradation during an early phase of sugar starvation (extended darkness for 5 to 9 h from the end of night) in mesophyll cells. We believe that these results indicate that autophagy contributes substantially to chloroplast degradation via the morphological changes observed in this study.  The descriptions about these experiments are in lines 284–300 in the Results section and in lines 426–444 in the Discussion section. 

      - Which parts of the plastid bud (Fig 2), about the dynamics of the events (Fig 3), about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association (Fig 6). 

      We performed multiple quantitative studies to address the issues listed above. We believe that these additional experiments strengthened our findings.

      - I suggest that the authors avoid using the term "vesicles" to describe the plastid-derived puncta, since it doesn't seem like coat proteins are required for their formation. I suggest "puncta" or similar terms. 

      We replaced the term “vesicles” with “puncta” or other suitable terms, as suggested.

      References for response to reviewers

      Abreu ME, Munné-Bosch S (2009) Salicylic acid deficiency in transgenic lines and mutants increases seed yield in the annual plant. J Exp Bot 60: 1261-1271.

      Boelter B, Mitterreiter MJ, Schwenkert S, Finkemeier I, Kunz HH (2020) The topology of plastid inner envelope potassium cation efflux antiporter KEA1 provides new insights into its regulatory features. Photosynth Res 145: 43-54.

      Brunkard JO, Runkel AM, Zambryski PC (2015) Chloroplasts extend stromules independently and in response to internal redox signals. Proc Natl Acad Sci U S A 112: 10044-10049.

      Caplan JL, Kumar AS, Park E, Padmanabhan MS, Hoban K, Modla S, Czymmek K, Dinesh-Kumar SP (2015) Chloroplast stromules function during innate immunity. Dev Cell 34: 45-57.

      Delaney TP, Uknes S, Vernooij B, Friedrich L, Weymann K, Negrotto D, Gaffney T, Gutrella M, Kessmann H, Ward E, Ryals J (1994) A Central Role of Salicylic-Acid in Plant-Disease Resistance. Science 266: 1247-1250.

      Hanson MR, Sattarzadeh A (2011) Stromules: Recent Insights into a Long Neglected Feature of Plastid Morphology and Function. Plant Physiol 155: 1486-1492.

      Ishida H, Yoshimoto K, Izumi M, Reisen D, Yano Y, Makino A, Ohsumi Y, Hanson MR, Mae T (2008) Mobilization of rubisco and stroma-localized fluorescent proteins of chloroplasts to the vacuole by an ATG gene-dependent autophagic process. Plant Physiol 148: 142-155.

      Kohler RH, Cao J, Zipfel WR, Webb WW, Hanson MR (1997) Exchange of protein molecules through connections between higher plant plastids. Science 276: 2039-2042.

      Kunz HH, Gierth M, Herdean A, Satoh-Cruz M, Kramer DM, Spetea C, Schroeder JI (2014) Plastidial transporters KEA1, -2, and -3 are essential for chloroplast osmoregulation, integrity, and pH regulation in. Proc Natl Acad Sci U S A 111: 74807485.

      Lee HN, Chacko JV, Solis AG, Chen KE, Barros JA, Signorelli S, Millar AH, Vierstra RD, Eliceiri KW, Otegui MS, Benitez-Alfonso Y (2023) The autophagy receptor NBR1 directs the clearance of photodamaged chloroplasts. Elife 12: e86030.

      Ono Y, Wada S, Izumi M, Makino A, Ishida H (2013) Evidence for contribution of autophagy to rubisco degradation during leaf senescence in Arabidopsis thaliana. Plant Cell Environ 36: 1147-1159.

      Smith AM, Stitt M (2007) Coordination of carbon supply and plant growth. Plant Cell Environ 30: 1126-1149.

      Usadel B, Blasing OE, Gibon Y, Retzlaff K, Hoehne M, Gunther M, Stitt M (2008) Global transcript levels respond to small changes of the carbon status during progressive exhaustion of carbohydrates in Arabidopsis rosettes. Plant Physiol 146: 1834-1861.

      Yoshimoto K, Jikumaru Y, Kamiya Y, Kusano M, Consonni C, Panstruga R, Ohsumi Y, Shirasu K (2009) Autophagy negatively regulates cell death by controlling NPR1dependent salicylic acid signaling during senescence and the innate immune response in Arabidopsis. Plant Cell 21: 2914-2927.

    1. eLife assessment

      This study provides valuable insights into the role of actin dynamics in regulating the transition of fusion models during homotypic fusion between late endosomes. The evidence supporting the authors' claims is convincing. However, while the observations are significant, the study could benefit from further exploration of the mechanistic details and physiological relevance.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript employs yolk sac visceral endoderm cells as a novel model for studying endosomal fusion, observing two distinct fusion behaviors: quick homotypic fusion between late endosomes, and slower heterotypic fusion between late endosomes and lysosomes. The mathematical modeling suggests that vesicle size critically influences the mode of fusion. Further investigations reveal that actin filaments are dynamically associated with late endosomal membranes, and are oriented in the x-y plane and along the apical-basal axis. Actin and Arf2/3 were shown to appear at the rear end of the endosomes along the moving direction suggesting polymerization of actin may provide force for the movement of endosomes. Additionally, the authors found that actin dynamics regulate homotypic and heterotypic fusion events in a different manner. The authors also provide evidence suggesting that Cofilin-dependent actin dynamics are involved in late endosome fusion.

      Strengths:

      The unique feature of this study is that the authors use yolk sac visceral endoderm cells to study endosomal fusion. Yolk sac visceral endoderm cells have huge endocytic vesicles, endosomes and lysosomes, offering an excellent system to explore endosomal fusion dynamics and the assembly of cellular factors on membranes. The manuscript provides a valuable and convincing observation of the modes of endosomal fusion and roles of actin dynamics in this process, and the conclusions of the study is justified by the data.

      Weaknesses:

      While the study offers compelling observations, it falls short in delivering clear mechanistic insights. Key questions remain unaddressed, such as the functional significance of actin filaments that extend apically in positioning late endosomes, the ways in which actin dynamics influence fusion events, and the functional implications of the slower bridge fusion process.

    3. Reviewer #3 (Public review):

      Summary:

      The authors found two endosomal fusion modes by live cell imaging of endosomes in yolk sac lateral endoderm cells of 8.5-day-old embryonic mice and described the fusion modes by mathematical models and simulations. They also showed that actin polymerization is involved in the regulation of one of the fusion modes.

      Strengths:

      The strength of this study is that the authors' claims are well supported by beautiful live cell images and theoretical models. By using specialized cells, yolk sac visceral endoderm cells, the live images of endosomal fusion, localization of actin-related molecules, and validation data from multiple inhibitor experiments are clear.

      Weaknesses:

      Although it would be out of scope of this study, there is no experimental verification of whether the mechanism of endosome fusion claimed by the authors occurs in general cells, so the article is limited to showing a phenomenon specific to yolk sac lateral endoderm cells. The methods used were very basic and solid. Most of the image analysis was performed manually, but the results were statistically tested.

      Summary:

      Seiichi Koike et al. studied two fusion models, explosive fusion, and bridge fusion, utilizing yolk sac visceral endoderm cells. They elucidated these two fusion models in vivo by employing mathematical modeling and incorporating fluctuations derived from actin dynamics as a key regulator for rapid homotypic fusion between late endosomes.

      Strengths:

      This study uncovered the role of actin dynamics in regulating the transition of fusion models in homotypic fusion between late endosomes and introduced a method for observing the fusion of single vesicles with two different targets.

      Weaknesses:

      The physiological significance of different fusion models is lacking.