10,000 Matching Annotations
  1. Nov 2024
    1. eLife Assessment

      This study presents a valuable methodological advancement in quantifying thoughts over time. A novel multi-dimensional experience-sampling approach is presented, identifying data-driven patterns that the authors use to interrogate fMRI data collected during naturalistic movie-watching. The experimentation is inventive and the analyses carried out and results presented are convincing.

    2. Reviewer #1 (Public review):

      The authors used a novel multi-dimensional experience sampling (mDES) approach to identify data-driven patterns of experience samples that they use to interrogate fMRI data collected during naturalistic movie-watching data. They identify a set of multi-sensory features of a set of movies that delineate low-dimensional gradients of BOLD fMRI signal patterns that have previously been linked to fundamental axes of cortical organization.

    3. Reviewer #2 (Public review):

      The present study explores how thoughts map onto brain activity, a notoriously challenging question because of the dynamic, subjective, and abstract nature of thoughts. To tackle this question, the authors collected continuous thought ratings from participants watching a movie, and additionally made use of an open-source fMRI dataset recorded during movie watching as well as five established gradients of brain variation as identified in resting state data. Using a voxel-space approach, the results show that episodic knowledge, verbal detail, and sensory engagement of thoughts commonly modulate visual and auditory cortex, while intrusive distraction modulates the frontoparietal network. Additionally, sensory engagement mapped onto a gradient from primary to association cortex, while episodic knowledge mapped onto a gradient from the dorsal attention network to visual cortex. Building on the association between behavioral performance and neural activation, the authors conclude that sensory coupling to external input and frontoparietal executive control are key to comprehension in naturalistic settings.

      The manuscript stands out for its methodological advancements in quantifying thoughts over time and its aim to study the implementation of thoughts in the brain during naturalistic movie watching.

      Strengths:

      (1) The study raises a question that has been difficult to study in naturalistic settings so far but is key to understanding human cognition, namely how thoughts map onto brain activation.

      (2) The thought ratings introduce a novel method for continuously tracking thoughts, promising utility beyond this study.

      (3) The authors used diverse data types, metrics, and analyses to substantiate the effects of thinking from multiple perspectives.

    4. Reviewer #3 (Public review):

      This study attempted to investigate the relations between processing in the human brain during movie watching and corresponding thought processes. This is a highly interesting question, as movie watching presents a semi-constrained task, combining naturally occurring thoughts and common processing of sensory inputs across participants. This task is inherently difficult because in order to know what participants are thinking at any given moment, one has to interrupt the same thought process which is the object of study.

      This study attempts to deal with this issue by aggregating staggered experience sampling data across participants in one behavioral study and using the population level thought patterns to model brain activity in different participants in an open access fMRI dataset.

      The behavioral data consist of 120 participants who watched 3 11-minute movie clips. Participants responded to the mDES questionnaire: 16 visual scales characterizing ongoing thought 5 times, two minutes apart, in each clip. The 16 items are first reduced to 4 factors using PCA, and their levels are compared across the different movies. The factors are "episodic knowledge", "intrusive distraction", "verbal detail", and "sensory engagement". The factors differ between the clips, and distraction is negatively correlated with movie comprehension and sensory engagement is positively correlated with comprehension.

      The components are aggregated across participants (transforming single subject mDES answers into PCA space and concatenating responses of different participants) and are used as regressors in a GLM analysis. This analysis identifies brain regions corresponding to the components. The resulting brain maps reveal activations that are consistent with the proposed mental processes (e.g. negative loading for intrusion in frontoparietal network, positive loadings for visual and auditory cortices for sensory engagement).

      Then, the coordinates for brain regions which were significant for more than one component are entered into a paper search in neurosynth. It is not clear what this analysis demonstrates beyond the fact that sensory engagement contained both visual and auditory components.

      The next analysis projected group-averaged brain activation onto gradients (based on previous work) and used gradient timecourses to predict the behavioral report timecourses. This revealed that high activations in gradient 1 (sensory→association) predicted high sensory engagement, and that "episodic knowledge" thought patterns were predicted by increased visual cortex activations. Then, permutation tests were performed to see whether these thought pattern related activations corresponded to well defined regions on a given cluster.

      In conclusion, this study tackles a highly interesting subject and does it creatively and expertly.

    1. eLife Assessment

      Rachubinski and colleagues provide an important manuscript that includes two major advances in understanding immune dysregulation in a large cohort of individuals with Down syndrome. The work comprises compelling, comprehensive, and state-of-the-art clinical, immunological, and autoantibody assessment of autoimmune/inflammatory manifestations. Additionally, the authors report promising results from a clinical trial with the JAK inhibitor tofacitinib for individuals with dermatological autoimmune disease.

    2. Reviewer #1 (Public review):

      Summary:

      This paper represents a huge amount of work on a condition whose patients' health and well-being have not always been prioritized, and only relatively recently has the immune dysregulation seen in patients with Down Syndrome (DS) been garnering major research interest.

      This paper provides an unparalleled examination of immune disorder in patients with DS. The authors also report the results from a clinical trial with the JAK inhibitor tofacitinib in DS patients.

      Strengths:

      This manuscript report an herculean effort and provides an unparalleled examination of immune disorder in a large number of patients with DS.

      Weaknesses:

      Not a major weakness but, apart from finding an elevation of CD4 T central memory cells and more differentiated plasmablast, several of the alteration reported in this manuscript had already been suggested by a few case reports and very small series. On the other hand, the number of patients (and controls) utilized for this study is remarkable and allows to draw much firmer conclusions.

      Comments on revised version:

      I don't have any further comments.

    3. Reviewer #2 (Public review):

      In this manuscript, Rachubinski and colleagues provide a comprehensive clinical, immunological, and autoantibody assessment of autoimmune/inflammatory manifestations of patients with Down syndrome (DS) in a large number of patients with this disorder. These analyses confirm prior results of excess interferon and cytokine signals in DS patients and extend these observations to highlight early-onset immunological aberrancies, far before symptoms occur, as well as characterizing novel autoantibody reactivities in this patient population. Then, the authors report the interim analysis of an open label, Phase II, clinical trial of the JAK1/3 inhibitor, tofacitinib, that aims to define the safety, clinical efficacy, and immunological outcomes of DS patients who suffer from inflammatory conditions of the skin. The clinical trial analysis indicates that the treatment is tolerated without serious adverse effects and that the majority of patients have experienced clinical improvement or remission in their corresponding clinical cutaneous manifestations as well as improvement or normalization of aberrant immunological signals such as cytokines.

      The major strength of the study is the recruitment and uniform, systematic evaluation of an impressive number of DS patients. Moreover, the promising early results from the tofacitinib clinical trial pave the way for analysis of a larger number of patients within the Phase II trial and otherwise, which may lead to improved clinical outcomes of affected patients. An inherent weakness of such studies is the descriptive nature of several parameters and the relatively small size of tofacitinib-treated DS patients. However, the descriptive nature of some of the correlative research analyses are of scientific interest and are useful to generate hypotheses for future additional (including mechanistic) work and treatment of 10 DS patients in a formal clinical trial at interim analysis is not a trivial task for a disease like this. The manuscript achieves the aims of the authors and the results support their conclusions. The authors appropriately acknowledge areas that require more research and areas that are not well understood. The results are represented in a useful manner and statistical methods and analyses appear sound.

      Comments on revised version:

      The authors have satisfactorily addressed my comments in the revised manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Individuals with Down syndrome (DS) have high rates of autoimmunity and can have exaggerated immune responses to infection that can unfortunately cause significant medical complications. Prior studies from these authors and others have convincingly demonstrated that individuals with DS have immune dysregulation including increased Type I IFN activity, elevated production of inflammatory cytokines (hypercytokinemia), increased autoantibodies, and populations of dysregulated adaptive immune cells that pre-dispose to autoimmunity. Prior studies have demonstrated that using JAK inhibitors to treat patient samples in vitro, in small case series of patients, and in mouse models of DS leads to improvement of immune phenotype and/or clinical disease. This manuscript provides two major advances in our understanding of the immune dysregulation and therapy for patients. First, they perform deep immune phenotyping on several hundred individuals with DS and demonstrate that immune dysregulation is present from infancy. Second, they report promising interim analysis of a Phase II clinical trial of a JAK inhibitor in 10 people with DS and moderate to severe skin autoimmunity.

      Strengths and weaknesses:

      The relatively large cohort and careful clinical annotation here provides new insights into the immune phenotype of patients with DS. For example, it is interesting that regardless of autoimmune disease or autoantibody status, individuals with DS have elevated cytokines and CRP. Analysis of the cohorts by age demonstrated that some cytokines are significant elevated in people with DS starting in infancy (e.g., IL-9 and IL-17C). Nearly all adults with DS in this study had autoantibodies (98%) and most had six or more autoantibodies (63%), which differed significantly from euploid study participants. This implies that all patients with DS might benefit from early intervention with therapy to reduce inflammation. However, it is also worth considering that an alternative interpretation that since hypercytokinemia does not vary based on disease state in individuals with DS, that this may not be a key factor driving autoimmunity (although it may be relevant for other clinical symptoms such as neuroinflammation).

      Small case series have suggested the benefit of JAK inhibitors to treat autoimmunity in DS. This is the first report of a prospective clinical trial to test a JAK inhibitor in this setting. The clinical trial entry criteria included moderate to severe autoimmune skin disease in patients aged 12-50 years with DS, and treatment was with the JAK1/3 inhibitor tofacitinib. This clinical trial is a critically important step for the field. The early results support that treatment is well tolerated with improvement of interferon scores in patients and reduction of autoantibodies. Most patients experienced clinical improvement, with alopecia areata having the greatest response. Treatment may not affect all skin disease equally, for example of the 5 patients with hidradenitis suppurativa, only 1 showed clinical improvement based on skin score. While very promising, the clinical trial results reported here are preliminary and based on interim analysis of 10 patients at 16 weeks. Individuals with DS have a lifelong risk of immune dysregulation and thus it is unclear how long therapy, if of benefit, would need to be continued. Results of longer-term therapy will be informative when considering the risks/benefits of this therapy.

      Comments on revised version:

      The authors have made appropriate revisions to this important contribution to the literature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper represents a huge amount of work on a condition whose patients' health and well-being have not always been prioritized, and only relatively recently has the immune dysregulation seen in patients with Down Syndrome (DS) been garnering major research interest.

      This paper provides an unparalleled examination of immune disorders in patients with DS. The authors also report the results from a clinical trial with the JAK inhibitor tofacitinib in DS patients.

      Strengths:

      This manuscript reports a herculean effort and provides an unparalleled examination of immune disorders in a large number of patients with DS.

      Weaknesses:

      Not a major weakness but, apart from finding an elevation of CD4 T central memory cells and more differentiated plasmablast, several of the alterations reported in this manuscript had already been suggested by a few case reports and a very small series. On the other hand, the number of patients (and controls) utilized for this study is remarkable and allows for drawing much firmer conclusions.

      We are grateful for the Reviewer’s very positive assessment of the work and results presented in this manuscript. We agree that many of the changes in the peripheral immune system reported here had been previously documented by our team and others using smaller sample sizes. However, as the Reviewer appreciated, this study involves an order of magnitude more research participants than previous studies (i.e., ~400 total participants, ~300 of them with trisomy 21 versus ~100 controls), which enabled us to investigate associations between immune changes and clinical variables, while also helping us draw much firmer conclusions.

      Reviewer #2 (Public Review):

      In this manuscript, Rachubinski and colleagues provide a comprehensive clinical, immunological, and autoantibody assessment of autoimmune/inflammatory manifestations of patients with Down syndrome (DS) in a large number of patients with this disorder. These analyses confirm prior results of excess interferon and cytokine signals in DS patients and extend these observations to highlight early-onset immunological aberrancies, far before symptoms occur, as well as characterizing novel autoantibody reactivities in this patient population. Then, the authors report the interim analysis of an open-label, Phase II, clinical trial of the JAK1/3 inhibitor, tofacitinib, that aims to define the safety, clinical efficacy, and immunological outcomes of DS patients who suffer from inflammatory conditions of the skin. The clinical trial analysis indicates that the treatment is tolerated without serious adverse effects and that the majority of patients have experienced clinical improvement or remission in their corresponding clinical cutaneous manifestations as well as improvement or normalization of aberrant immunological signals such as cytokines.

      The major strength of the study is the recruitment and uniform, systematic evaluation of an impressive number of DS patients. Moreover, the promising early results from the tofacitinib clinical trial pave the way for analysis of a larger number of patients within the Phase II trial and otherwise, which may lead to improved clinical outcomes for affected patients. An inherent weakness of such studies is the descriptive nature of several parameters and the relatively small size of tofacitinib-treated DS patients. However, the descriptive nature of some of the correlative research analyses is of scientific interest and is useful to generate hypotheses for future additional (including mechanistic) work, and treatment of 10 DS patients in a formal clinical trial at interim analysis is not a trivial task for a disease like this. The manuscript achieves the aims of the authors and the results support their conclusions. The authors appropriately acknowledge areas that require more research and areas that are not well understood. The results are represented in a useful manner and statistical methods and analyses appear sound.

      We appreciate the very positive evaluation by this Reviewer. We agree with the Reviewer on the descriptive nature of many of the analyses completed and on the value of a larger cohort of individuals with Down syndrome treated with a JAK inhibitor. The clinical trial will involve a total of 40 participants, and we look forward to reporting the results from the full cohort in the near future.

      Reviewer #3 (Public Review):

      Summary:

      Individuals with Down syndrome (DS) have high rates of autoimmunity and can have exaggerated immune responses to infection that can unfortunately cause significant medical complications. Prior studies from these authors and others have convincingly demonstrated that individuals with DS have immune dysregulation including increased Type I IFN activity, elevated production of inflammatory cytokines (hypercytokinemia), increased autoantibodies, and populations of dysregulated adaptive immune cells that pre-dispose to autoimmunity. Prior studies have demonstrated that using JAK inhibitors to treat patient samples in vitro, in small case series of patients, and in mouse models of DS leads to improvement of immune phenotype and/or clinical disease. This manuscript provides two major advances in our understanding of immune dysregulation and therapy for patients. First, they perform deep immune phenotyping on several hundred individuals with DS and demonstrate that immune dysregulation is present from infancy. Second, they report a promising interim analysis of a Phase II clinical trial of a JAK inhibitor in 10 people with DS and moderate to severe skin autoimmunity.

      Strengths and weaknesses:

      The relatively large cohort and careful clinical annotation here provide new insights into the immune phenotype of patients with DS. For example, it is interesting that regardless of autoimmune disease or autoantibody status, individuals with DS have elevated cytokines and CRP. Analysis of the cohorts by age demonstrated that some cytokines are significantly elevated in people with DS starting in infancy (e.g., IL-9 and IL-17C). Nearly all adults with DS in this study had autoantibodies (98%) and most had six or more autoantibodies (63%), which differed significantly from euploid study participants. This implies that all patients with DS might benefit from early intervention with therapy to reduce inflammation. However, it is also worth considering that an alternative interpretation that since hypercytokinemia does not vary based on disease state in individuals with DS, this may not be a key factor driving autoimmunity (although it may be relevant for other clinical symptoms such as neuroinflammation).

      Small case series have suggested the benefit of JAK inhibitors to treat autoimmunity in DS. This is the first report of a prospective clinical trial to test a JAK inhibitor in this setting. The clinical trial entry criteria included moderate to severe autoimmune skin disease in patients aged 12-50 years with DS, and treatment was with the JAK1/3 inhibitor tofacitinib. This clinical trial is a critically important step for the field. The early results support that treatment is well tolerated with an improvement of interferon scores in patients and reduction of autoantibodies. Most patients experienced clinical improvement, with alopecia areata having the greatest response. Treatment may not affect all skin diseases equally, for example of the 5 patients with hidradenitis suppurativa, only 1 showed clinical improvement based on skin score. While very promising, the clinical trial results reported here are preliminary and based on an interim analysis of 10 patients at 16 weeks. Individuals with DS have a lifelong risk of immune dysregulation and thus it is unclear how long therapy, if of benefit, would need to be continued. The results of longer-term therapy will be informative when considering the risks/benefits of this therapy.

      We thank the Reviewer for the very positive evaluation. We agree with the Reviewer that the hypercytokinemia of Down syndrome may contribute to other pathophysiological processes beyond autoimmune conditions. Although many cytokines elevated in Down syndrome have well demonstrated pathogenic roles in the etiology of autoimmune diseases in the general population (e.g., TNF-a, IL-6), their consistent upregulation in DS regardless of clinical evidence of autoimmune pathology indicates the existence of a prolonged pre-clinical period, where the hypercytokinemia likely precedes evident tissue damage and symptomology. Alternatively, it is possible that these elevated cytokines are contributing the overall pathophysiology of DS (e.g., neuroinflammation, cognitive impairments, complications from viral infections) without formal diagnosis of an autoimmune disease. We also agree with the Reviewer that not all immune skin conditions would respond equally to JAK inhibition. Based on recent approvals for JAK inhibitors in the immunodermatology field, it is expected that JAK inhibition would show the greatest benefits for alopecia areata, atopic dermatitis, and psoriasis, with less clear results for hidradenitis suppurativa. We hope to contribute to this field through the analysis of the full clinical trial cohort in the near future. Lastly, we strongly agree with the need to assess the value of long-term therapy with JAK inhibitors or other immune therapies in people with Down syndrome for various clinical endpoints.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This paper represents a huge amount of work on a condition whose patients' health and well-being have not always been prioritized, and only relatively recently has the immune dysregulation seen in patients with Down Syndrome (DS) been garnering major research interest.

      This paper provides an unparalleled examination of immune disorder in patients with DS. In a truly herculean effort, the authors provided the cumulative examination of over 440 patients with DS, confirmed the alterations in immune cell subsets (n=292, 96 controls) and multi-organ autoimmunity seen in these patients as they age, and identified autoantibody production that could contribute to conditions co-occurring in patients with DS. They also sought to look at whether the early immunosenescence seen in DS was due to the inflammatory profile by comparing age-associated markers in DS patients and euploid controls separately, finding that several markers are regulated with age regardless of group, while comparing the effect of age versus DS status on cytokine status identified inflammatory markers elevated in DS patients across the lifespan that do not increase with age or that increase with age only in the DS cohort. This is very interesting in the context of DS in particular, and immunity during aging in general.

      The second part of the manuscript presents the results from a clinical trial with the JAK inhibitor tofacitinib in DS patients. While the number of DS patients treated with tofacitinib was small, the results were often quite striking. Treatment was well-tolerated and the improvement of dermatological conditions was clear. The less responsive patients AA4 and AA2 provide a very clear illustration that these patients are sensitive to immune triggers during treatment. Additionally, the demonstration that patients' IFN scores and cytokine levels decreased without clear immunosuppression with tofacitinib treatment is encouraging, since treatment with this drug would need to be continuous. I would be curious to see if the patients added past the cutoff for interim analysis follow a similar trajectory. I would not ask the authors to add any data; the paper is well-written and logically constructed.

      I only have a small comment: I really did not like how Figure 2 a, d, and g tethered the coloring to the magnitude of fold change to show the effect of DS particularly for 2a and 2g. Given that these fold changes are quite modest, the coloring is very light and hard to distinguish. The clear takeaway is that the effect on T cells is greatest, but there must be a better way to illustrate this. Perhaps displaying this graph on a non-white background could help with contrast.

      We are grateful for the Reviewer’s very positive assessment of the manuscript and constructive feedback. We want to assure the Reviewer that similar analyses will be completed in the future for the entire cohort recruited into the trial to determine if similar trajectories and results are observed with the larger sample size. Additionally, following Reviewer’s guidance, we have modified the color scales in Figures 2a, d and g so that each panel is on its own dynamic range, thus emphasizing the differences within each immune cell lineage.

      Reviewer #2 (Recommendations For The Authors):

      • Although the focus of the patients in the first part of the paper is on autoimmune/inflammatory conditions, it will be useful to also list the non-autoimmune infectious manifestations for reference with prevalence data. For example, otitis media, or lung infections (mentioned within the paper), or mucosal candidiasis. Same for other manifestations such as cardiac or malignant conditions. Given the impressive number of patients, it will be useful to the readers to have prevalence data for these as well, even in brief statements within the results.

      We appreciate this inquiry by the Reviewer. Following Reviewer’s guidance, we have included information on recurrent otitis media, frequent/recurrent pneumonia, congenital heart defects requiring repair, and various forms of leukemia. These additional data are presented in a revised Supplementary file 1 and briefly discussed in the results.

      • Have the authors looked at DN T cells and whether they may be enriched in DS patients, given their enrichment in some autoimmune conditions?

      Thanks for this inquiry. We did examine DN T cells (double negative T cells), which we referred to in our Figure 2 and Figure 2 – figure supplement 1 as non-CD4+ CD8+ T cells. Although this T cell subset is mildly elevated (in terms of frequency among T cells) in individuals with Down syndrome, the result did not reach statistical significance after multiple hypothesis correction. This negative result is shown in the heatmap in Figure 2 – figure supplement 1d.

      • It would be useful to move the segment of the discussion that discusses the interim predefined analysis of the phase 2 trial to the corresponding segment of the results. As this reviewer was reading the paper, it was unclear why the interim analysis was done, whether it was predefined and it was not until the discussion that it became apparent. I believe it will help the readers to have a brief mention that this interim analysis was predefined and set to occur at the first 10 DS enrollees. Also, it would be helpful to state what is the total number of DS patients planned for enrollment in the Phase 2 trial which is continuing recruitment.

      We appreciate this comment. Following the Reviewer’s guidance, we have revised the text to explain in the Results section that the interim analysis was predefined and triggered once the first 10 participants completed the 16 weeks of treatment. We also explain that the trial will be considered complete once a total of 40 participants undergo 16-weeks of treatment.

      • Although the authors present data on TPO autoantibodies before and after tofacitinib, it remains unclear whether the other non-TPO autoantibodies were altered during treatment or whether this was a TPO autoantibody-specific phenomenon. Was there an alteration in mature B cells or plasmablast populations after tofacitinib? If these data are available, they would further enhance the manuscript. If they are not available, it would be useful for the authors to discuss those in the discussion of the manuscript.

      We are grateful for this comment, which strongly aligns with our future research interests and plans for the analysis of the full cohort once the trial is completed. In the interim analysis, we analyzed only auto-antibodies related to autoimmune thyroid disease and celiac disease, as shown in the manuscript. However, we plan to complete a more comprehensive analysis of the effects of JAK inhibition on autoantibody production once the full sample set is available at the end of the trial. Likewise, the clinical trial protocol contemplates collection and processing of blood samples for immune mapping using mass cytometry, which will enable us to answer the question from the Reviewer about potential changes in B cells or plasmablast populations. Following Reviewer’s guidance, we discuss these planned analyses in the Discussion of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) Cellular immune phenotyping data in Figure 2 presents a large number of patients with DS versus euploid controls (292 and 96 respectively). Given the relatively large cohort there would seem to be an opportunity to determine whether age or sex alters the immune phenotype shown, for example, TEMRAs, etc. Was the data analyzed in this way?

      We welcome this comment, which clearly aligns with our research interests and planned additional analyses of these datasets generated by the Human Trisome Project. We can share with the Reviewer that although sex as a biological variable has minimal impacts on the strong immune dysregulation observed in Down syndrome, there are clear age-dependent effects, with some immune changes occurring early during childhood versus others taking place later in adult life. A manuscript describing a complete analysis of age-dependent effects on the multi-omics datasets in the Human Trisome Project is currently under preparation.

      (2) The authors should strongly consider incorporating/discussing the findings from Gansa et al, Journal of Clinical Immunology May 2024 - where they reviewed the immune phenotype of 1299 patients with Down syndrome.

      Thanks for this publication to our attention, which is not cited in the revised manuscript.

      (3) It is difficult to differentiate patients Hs2 and Ps1 in Figure 5d.

      Thanks for this observation, we have modified the labels for greater clarity in the revised manuscript.

      (4) Given their finding of no correlation between cytokine levels/immune phenotype and autoimmunity, some additional discussion of the relevance of hypercytokinemia in the pathogenesis of autoimmunity would seem relevant (given that this was the basis for the clinical trial). The authors mention that cytokine levels may not be appropriate measures of disease in the patients.

      We welcome this suggestion and have revised the Discussion along these lines.

      (5) Data availability statement: appropriate.

    1. eLife Assessment

      This valuable study reports a novel function of ATG14 in preventing pyroptosis and inflammation in oviduct cells, thus allowing smooth transport of the early embryo to the uterus and implantation. The data supporting the main conclusion are solid. This work will be of interest to reproductive biologists and physicians practicing reproductive medicine.

    2. Reviewer #1 (Public review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+;Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

    3. Reviewer #2 (Public review):

      In this manuscript, Popli et al investigated the roles of autophagy related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation as well as embryo transport from oviduct to uterus. Further analysis showed that Atg14 cKO leads to increased pyroptosis in oviduct, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. The authors concluded that Atg14 is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      The authors have barely addressed most of my concerns in this revised version with a few minor issues remaining to be addressed:<br /> (1) The authors tried to address my first concern regarding the statement that "autophagy is critical for maintaining the oviduct homeostasis". The revised statement in Line 53-54 "we report that Atg14-dependent autophagy plays a crucial role in maintaining..." is still not correct. It should be corrected as " we report that autophagy-related protein Atg14 plays a crucial role in maintaining...".<br /> (2) Line 349-351 described 80-90% of blastocysts retrieved from oviducts of cKO mice, which is in consistent with Figure 3B (showing more than 98%).<br /> (3) Line 447, "Fig. 5E" should be Fig. 6A. In addition, grammar error in the next sentence.<br /> (4) In Figure 6D, why the composition of blastocysts in chemical treated group do not add up to 100%.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 use PrCre and also Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The revised manuscript has included new experimental data (Figs. S2B, 5B, 5C, and S3) that satisfied the concerns of this reviewer. The manuscript should provide important advancement to the field.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate the opportunity to submit a revision of our manuscript entitled: "The Autophagy Protein, ATG14 Safeguards Against Unscheduled Pyroptosis Activation to Enable Embryo Transport During Early Pregnancy" by Popli et al. We thank all three Referees for underscoring the importance of our findings as well as the constructive critiques that we used to improve our paper. Most notably, we added the following new data:

      · To provide more insight into whether pyroptosis activation occurs distinctly in the oviduct, we looked for GSDMD, (primary executioner of the pyroptosis pathway) expression in the uterus and ovary too. We observed no signs of pyroptosis activation in response to ATG14 loss in either the uterus or ovary of Atg14 cKO mice compared to control ones suggesting that ATG14 plays a distinct role in regulating pyroptosis specifically in the oviduct (Revised Figure 5F).

      · To better understand the molecular mechanisms of pyroptosis activation in the oviducts, we examined various key markers of mitochondrial integrity, architecture, and function in control and Atg14 cKO oviducts. Our findings indicate a significant loss of mitochondrial structural and functional integrity, possibly contributing to the embryo retention phenotype via activating the pyroptosis pathway in the oviduct. (Revised Figure 5B & C).

      · To address the spatiotemporal and region-specific expression of ATG14 in the oviduct, we performed immunofluorescence analysis and observed the consistent expression of ATG14 in all the cellular compartments of oviducts including ciliary epithelial cells, secretory epithelial cells, and smooth muscle cells. Moreover, the region-specific expression analysis revealed that distinct expression of ATG14 in the ampullary region of cKO mice oviduct helps to preserve its structural integrity. Conversely, its loss in the isthmus region of the oviduct in concordance with active PR-cre activity causes completely distorted epithelial structures with luminal obliteration or narrowing resulting in an unorganized and obstructed lumen leading to embryo retention, suggesting that ATG14 is essential for maintaining the structural integrity of the oviduct (Revised Figure 3F & S2A).

      · Considering the expression of PR-cre in the pituitary, which could potentially influence hormonal secretion and ovulation, we evaluated the levels of E2 and P4 during pregnancy. Our findings show that these hormone levels remained unchanged in Atg14 cKO mice, indicating that the absence of ATG14 does not negatively affect the HPG axis or pituitary function (Revised Figure 2F).

      · ATG14 is an essential factor for the initiation of autophagy, and its loss can lead to reduced or inhibited autophagic activity. Consistently, we observed elevated levels of LC3b and p62 proteins, two well-known markers of autophagic flux in the oviducts of Atg14-deficient mice implying that loss of ATG14 leads to defective autophagy potentially disturbing the structural integrity of oviductal epithelial cells and impairing embryo transport. (New Supplementary Figure S2B).   

      Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+; Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

      Major comments:

      (1) It is interesting that deletion of Atg14 using PgrCre results in pyroptosis only in the oviduct; the authors should speculate/evaluate why the oviduct, but not the uterus or follicles. Is there any cellular specificity that is sensitive to autophagy/pyroptosis in the oviduct but not in other cell types? This has not been evaluated or discussed in the manuscript. Is it possible to include GSDMD IHC for the uterine section to ensure that there was no pyroptosis event in the cKO uteri?

      We performed GSDMD IHC and found that, unlike in the oviduct, the cKO uteri and ovaries do not exhibit detectable pyroptosis (Revised Figure 5F). Additionally, we have added text to the discussion section addressing possible reasons for the differential impact of Atg14 loss on pyroptosis along the reproductive tract continuum (Line number: 532-538)

      (2) Please include an explanation of how a loss of Atg14, important for the initiation process of autophagy (as indicated in line 88), can lead to pyroptosis. There was some discussion about inflammation. But the connection is still missing.

      We thank the reviewer for noting on this. We have now included a possible explanation of how autophagy could impact pyroptosis in the discussion section (Line number: 532-538)  

      (3) No expression data of ATG14 using IHC/IF analysis were included in the manuscript - this is missing. This is needed and important as the authors found that Foxj1Cre/+; Atg14f/f cKO mice had no fertility defect. Is it possible that ATG14 is not present in the ciliated epithelial cells of the oviduct? In addition, the data in Figure 5B also points to this speculation. This is because the GSDMD (the pyroptosis marker) is only observed in the isthmus region but not the ampulla.

      We thank the reviewer for this nice suggestion. We performed the immunofluorescence analysis for ATG14 expression in control and Atg14 cKO oviducts and observed the consistent expression of ATG14 in all the cellular compartments of oviducts including ciliary epithelial cells, secretory epithelial cells, and smooth muscle cells (New Supplementary Figure S2A). We also looked for α-tubulin expressions in the oviduct of Foxj1Cre/+; Atg14 f/f mice and control mice and observed that ciliated epithelial cells that were positive for acetylated α-tubulin staining did not appear to be different in Foxj1Cre/+; Atg14 f/f mice oviduct compared to controls (Revised Figure 4C). However, due to the unavailability of reliable fluorescent-labeled antibodies for both Foxj1 and Atg14, we were unable to conduct the co-localization study as intended. This limitation hindered our ability to precisely determine the spatial overlap of these proteins within the tissue.

      (4) In line with the previous comment, is ATG14 present in the human Fallopian tube? If so, which cell type? This needs to be addressed.

      Author’s Response: We appreciate the reviewer's valuable suggestion. While we currently lack access to human fallopian tube biopsies, the Human Protein Atlas (https://www.proteinatlas.org/ENSG00000126775-ATG14) demonstrates distinct ATG14 expression in various fallopian tube cell types, with localization in the cytoplasm, membrane, and nucleus.

      (5) As PgrCre is also expressed in the pituitary, is it possible that the deletion of Atg14 using PgrCre would affect pituitary function – hence a change in the FSH/LH secretion that subsequently affects ovulation? Although the uterine and ovarian histology in the Atg14 cKO looks similar to the controls, is it possible that cyclicity is also affected? The authors should evaluate whether the estrous cycle takes place regularly.

      Author’s Response: Thank you for the insightful comment. However, evaluating the estrous cycle requires significant time and effort and is beyond the scope of the current manuscript. Nonetheless, we have now shown that both P4 and E2 levels were not altered in Atg14 cKO mice, indicating that the loss of Atg14 did not adversely impact the HPG axis, and by extension, pituitary function (Revised Figure 2F).

      (6) The number of total embryos/oocytes in the cKO compared to the control has not been evaluated - this data must be included. Do the changes in autophagy in Atg14 cKO affect preimplantation embryo development? Please categorize the embryos found in the oviduct/uterus in both genotypes. i.e., % blastocyst, % morula, % developmentally delayed, % non-viable etc. It would be interesting to evaluate if the oviduct with heavy pyroptosis can support preimplantation embryo development.

      Author’s Response: We thank the reviewer for this nice suggestion. We categorized the embryos into different categories as suggested and included the data (Revised Figure 3C and Figure 6D).

      (7) It is unclear why the superovulation+mating experiment (Figure 3C) was performed. Please provide justification. Why was the data from natural mating (Figure 3A) insufficient?

      Author’s Response: In Figure 3C, superovulation was employed to complement the natural mating studies and to provide stronger evidence for the embryo retention phenotype observed in the oviduct.

      (8) In lines 297-298, the conclusion that "ATG14 is required for P4-mediated but not for E2-mediated actions during uterine receptivity" is not entirely correct. This is because the authors also observed that the downregulation of MUC1 (E2-target protein) is absent in the PgrCre/+;Atg14f/f cKO female uteri.

      We thank the reviewer for noting this. We detected more E2-induced targets in D-4 pregnant uterine samples and found no change in their expression in response to Atg14 depletion in cKO females (Revised Figure 2E).

      (9) Figure 3D: Please include an image that also represents the ampulla region. All images are from the isthmus region. It would be informative to see if the loss of cell boundaries also takes place at the ampulla region in the cKO oviduct.

      We thank the reviewer for this nice suggestion. We included the ampulla section from the cKO and control female oviducts (Revised Figure 3F). As PR-cre activity is limited to isthmus only [1, 2], we did not see any structural abnormality in ampulla sections of cKO oviducts.

      (10) Figure 3E: Please indicate which region the TEM was performed. Isthmus? Ampulla? Were the changes in mitochondrial phenotype observed across all oviductal regions?

      The TEM imaging was performed by the WashU Core services. Although we clearly mentioned the core person to look into the isthmus region only, we are not sure if they accurately follow the instructions.

      (11) Figure 4B; the evaluation of FOXJ1 IHC. The authors need to include sections that also have an ampulla region-especially in the cKO. In addition, it is misleading to state that there were fewer FOXJ1+ cells (line 361) in the cKO if the region being evaluated is the isthmus (which has a lot fewer ciliated epithelial cells in general) while the control image showed an ampulla where the abundancy of ciliated epithelial cells (FOXJ1+) is higher than that of the isthmus. The authors also need to include a higher resolution image (a zoom-in at the ciliated epithelial cells with FOXJ1+ signal) as well as the quantification of FOXJ1+ cells.

      We appreciate the reviewer for the suggestion. In Figure 4A, we have already shown the ampulla region from both control and cKO oviducts, wherein alpha-tubulin staining was evident in both oviducts.  

      We agree with the reviewer that the isthmus usually has fewer ciliary epithelial cells than the ampulla, however, as illustrated in Figures 4A and 4B, Atg14 depletion causes a marked disruption of structural integrity with loss of cell boundaries specifically in the isthmus, which is far more pronounced than in the ampulla. One reason for this is the reported Pgr Cre activity, which is much more robust in the isthmus than in the ampulla [1, 2] . This disruption leads to the substantial loss of both ciliated and secretory cells, compromising the epithelial architecture to such an extent that it is impossible to accurately quantify the Foxj1 signal as can be seen in higher resolution images in New Supplementary Figure S3.

      For more clarity, we modified the statement in the revised file (Line Number: 393-396)

      (12) All IHC/IF and embryo images need to include the scale bars.

      We thank the reviewer for this suggestion. We now included the scale bar in all the images.

      (13) Figure 5H: although IL1B is being discussed, there was no data in this study to support the figure.

      In Figure 5H, IL1B is presented as part of the pyroptosis signaling pathway. As we have already shown other key executioners of this pathway: Caspase 1 and GSDMD, we believe that additional IL1B data would not provide new insights beyond what has already been shown.

      Minor comments:

      (1) Please include n (sample size) for all data, including the histology image in the figure legends for all studies.

      We now included the sample size in figure legends for all data shown in the manuscript.

      (2) Line 32, did the authors mean to say, "Self-digestion of..." instead of "Self-digestion for..."?

      In Line 32, we meant, “Cellular self-digestion for female reproductive tract functions”. We have now corrected the statement.

      Fig. 1A - please include negative control.

      We included the negative control (Revised Figure 1)

      (3) Figure 1E left panel and Figure 4C - please label "Average no. of pups/female/litter" as each female has more than one litter over her reproductive lifespan. If the authors represent pups/females, then the number should be accumulative in the range of 35-40pups/females in the control group.

      We thank the reviewer for noting this. We now corrected the label in both Revised Figure 1E and Revised Figure 4E.

      (4) Line 273: please remove "& F" as there is no Figure F in the image.

      We removed “&F” from the Line 273.

      (5) The presence of CL is not always indicative of normal hormonal levels; therefore, the authors should include the measurement of progesterone levels at 3.5 dpc in the cKO compared to the control group. Hormonal regulation is also crucial for embryo transport.

      We thank the reviewer for this suggestion. We measured not only P4 but also E2 levels in D4 pregnant females and found no significant difference in their levels compared to corresponding controls (Revised Figure 2F).

      (6) Figure 2A shows that KRT expression is not present in the control uteri. Although the KRT8 levels may have decreased at 4 dpc, they should be present (see Figure S2A).

      We observed no decrease in KRT expression in control uteri on 5 dpc. We included better-resolution images for KRT expression (Revised Figure 2A).

      (7) The dotted white lines in Figure 2A are too thick. It's difficult to see the Ki67 positive signal in the luminal epithelial cells. Please also add a quantitative analysis of Ki67+ cells in the luminal epithelium vs. stromal cells.

      We now corrected the dotted lines in Revised Figure 2B. However, as the Ki-67 proliferation is evident in the representative images, we believe quantification analysis will not add anything new to the existing conclusion.

      (8) Figure 2D - the y-axis mentions the weight ratio. However, the figure legend describes the transcript levels of Atg14 - please correct this.

      We corrected the label in the revised manuscript.

      (9) Line 294 - Please correct Figure 2C to Figure 2B.

      We corrected it.

      (10) Line 308 - Please correct Figure 2E to Figure 2F.

      We corrected it.

      (11) Line 310 - Please correct Figure 2F to Figure 2G.

      We corrected it.

      (12) Line 311 - Please correct Figure 2F to Figure 2G.

      We corrected it.

      (13) Information in Figure S2A and S2B should be included in the main figure.

      We thank the reviewer for this nice suggestion. We now included the figures S2A and S2B in the main figure (Revised Figure 2C & D).

      (14) Figure 3C - due to a lot of cellular debris after flushing, it's difficult to see. But it seems like there are secondary follicles in the flushing of control oviducts - this is highly unlikely. This could be due to an artifact of an accidental poking of the ovaries during collection.

      We agree with the reviewer. It might be due to the unintentional poking of the ovaries. We will take extra care in future experiments to avoid this and ensure clean flushing to prevent any confusion from debris or artifacts.

      (15) Figure 2B and Figure 3D signals from DAPI are missing - it's black with no blue signal. This could be the data loss during file compression for manuscript submission.

      We included better-resolution pictures for the DAPI signal in Revised Figure 2B & Figure 3F.

      (16) Explain why some embryos in the cKO make it to the uterus when the females are superovulated.

      It might be due to the heightened hormonal stimulation provided by the superovulation which could facilitate the movement of some embryos through the oviduct despite any defects or abnormalities caused by the loss of ATG14 in the oviduct.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles ofATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      We thank the reviewer for noting this. We corrected the statement in the revised manuscript (Line number: 53-54).

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 has knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      We now included the ATG14 expression data in the oviduct (New Supplementary Figure S2A). Consistent with previous studies reporting PR-cre activity in the isthmus [1, 2] , we observed that Atg14 depletion was more pronounced in the isthmus compared to the ampulla. However, generating a secretory Pax-8 cell Cre mice model will require a substantial amount of time and effort, and we respectfully note that this is beyond the scope of the current manuscript.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      We understand the reviewer’s concern. We included new data for ATG14 expression in control and Atg14 cKO mice oviducts (New Supplementary Figure S2A). However, due to the unavailability of reliable fluorescent-labeled antibodies for both Foxj1 and Atg14, we could not conduct the co-localization studies as intended, and this limitation hindered our ability to precisely determine the spatial overlap of these proteins within the oviduct. Nonetheless, Foxj1-cre is a widely used mice model with reported cre-activity in ciliary epithelial cells including oviduct tissues [3]. Given the widespread expression of ATG14 in all the ciliary and secretory cells (New Supplementary Figure S2A) and distinct FOXJ1 expression in the oviduct (New Supplementary Figure S3), we are confident that Atg14 is deleted in the ciliary epithelial cells of Foxj1/Atg14 cKO mice oviducts.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in the control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      Decidualization is a post-implantation event, whereas our study primarily focuses on pre-implantation events in the oviduct. Therefore, we have removed all data related to human and mouse decidualization to enhance the clarity and precision of our study.

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes, and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      Dr. Ramya Masand, Chief pathologist in the Pathology Department at the Baylor College of Medicine, and a contributing author, assessed the H&E-stained oviduct sections from control and cKO mice. We have now included a new Supplementary Figure S3 with previous representative H&E images that depict the cellular alterations described in lines 332–336.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      We thank the reviewer for this nice suggestion. We calculated the average number of embryos and found no difference in the number of embryos recovered from cKO or polyphyllin-treated pregnant mice at 4 dpc compared to their controls. (New Supplementary Figure S4A & B).

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      We performed the GSDMD staining IHC in Polyphyllin VI or vehicle-treated mice oviducts and observed elevated GSDMD expression with Polyphyllin V (New Figure 6E). However, no significant lumen disruption was detected, which may be attributed to the short-term exposure of the oviducts to pyroptosis induction, in contrast to the more pleiotropic effects observed in genetically induced models. Nonetheless, this observation clearly indicates that unscheduled or unwarranted activation of pyroptosis impedes embryo transport.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers better understand your experiments. Alternatively, you can add it in the introduction.

      We thank the reviewer for this nice suggestion. We included literature on the pyroptosis pathway in the introduction section (Line Number: 105-118).

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      We added definitions for the acronyms such as FRT, HESCs, and GSDMD used in the study.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

      We thank the reviewer for the suggestion. We have revised the manuscript accordingly to ensure clarity and precision in describing the oviductal epithelial structural changes observed in the absence of ATG14.

      A few of the additional comments for authors to consider improving the manuscript are listed below.

      (1) Some of the figures are missing scale bars, while others have inconsistent scale bars. It would be better to be consistent.

      We now included the scale bars in all images.

      (2) On a couple of occasions, the DAPI signal cannot be seen, such as in Figure 2B and Figure 3D.

      We now included better-resolution images for the DAPI signal in all fluorescent images shown in the revised manuscript.

      (3) Overall, the figure legends can be improved to provide more detailed information to help the reader to interpret the data.

      We included additional details in all the figure legends in the revised manuscript.

      (4) In Figure 2D, the Y-axis showed the stimulated/unstimulated uterine weight ratio, why did the author put "Atg14" at the top of the graph? At the same time, the X-axis title is missing in Figure 2D.

      We apologize for the typo error. We removed “Atg14” from the top of the graph and included the X-axis title in the revised manuscript.

      (5) In the left panel of Figure 2G, "ATG14" at the top should be "Atg14" to be consistent.

      In Figure 2G, we are representing “ATG14” according to human gene annotation.

      (6) In line 559, there miss "(A)" in front of Immunofluorescence analysis of GSDMD.

      We thank the reviewer for noting this. We corrected it in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using Pr Cre and also Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provide spatial evidence about a new mechanism linking Atg14 to pyroptosis.

      We thank the reviewer for the positive statements and constructive comments on our manuscript.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      We thank the reviewer for this nice suggestion. We performed GSDMD IHC and found that, unlike in the oviduct, the cKO uteri and ovaries do not exhibit detectable pyroptosis (Revised Figure 5F). Additionally, we have added text to the discussion section addressing possible reasons for the differential impact of Atg14 loss on pyroptosis along the reproductive tract continuum (Line number: 532-538)

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so, does that contribute to the observation?

      We thank the reviewer for the nice suggestion. We found LC3b and p62 protein levels, two well-known markers of autophagic flux are elevated due to Atg14 loss in the oviduct (New Supplementary Figure S2B).  Since, p62 accumulation is an indicative of the reduced autophagic flux [4], we posit loss of Atg14 results in defective autophagy in the oviduct. Importantly, this defective autophagy adversely impacted the structural integrity of oviductal epithelial cells, causing impairment in embryo transport.

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      We thank the reviewer for the suggestion. We have revised the text for clarity and precision in describing the oviductal epithelial structural changes observed in the absence of ATG14. To avoid ambiguity, we have removed the term "cellular plasticity." We have already provided extensive evidence, including multiple H&E stains and immunofluorescence analyses for KRT8 and smooth muscle actin to illustrate cellular integrity in both control and cKO oviducts. However, we respectfully believe that performing additional experiments on cellular integrity would not contribute further to the conclusions already drawn.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Diff erence in membrane potential etc. for a definitive conclusion.

      We appreciate the reviewer’s suggestion. We carried out the TOM20 (mitochondrial structural marker) and cytochrome C (mitochondrial damage and cell death marker) immune-colocalization study and found loss of TOM20 signal with concomitant cytochrome c leakage into the peri-nuclear space (Revised Figure 5B). Additionally, we also observed reduced expression of mitochondrial structural and functional markers by qPCR analysis (Revised Figure 5C). However, we respectfully argue that conducting membrane potential studies on murine oviducts is extremely complex and is beyond the scope of this study.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      We have now included a New Supplementary Figure S3 with representative previous immunofluorescence images that clearly show the narrowing of the lumen with Atg14 loss in the oviduct.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

      We respectfully disagree with the reviewer on this point. We have provided substantial evidence regarding the cellular mechanisms through which the loss of Atg14 may lead to the activation of pyroptosis as outlined below:

      (1) Cellular Changes: Loss of ATG14 in the oviduct results in cellular swelling and the formation of fused membranous structures, which are characteristic features of pyroptosis activation.

      (2) Expression of Key Pyroptosis Proteins: We observed an induced expression of GSDMD and Caspase-1, primary executioners of the pyroptotic pathway, in response to Atg14 loss.

      (3) Inflammatory Markers: Elevated levels of inflammatory markers such as TNF-α and CXCR3 were detected, both of which are known to promote pyroptosis [5, 6].

      (4) Mitochondrial Damage: We have added new data demonstrating disrupted colocalization of TOM20 (a mitochondrial structural marker) and Cytochrome c (a cell death marker), resulting in Cytochrome c leakage into the perinuclear space (Revised Figure 5B). Additionally, qPCR analysis revealed reduced expression of mitochondrial structural and functional markers in cKO oviduct tissues (Revised Figure 5C).

      Based on these evidences, we can clearly say that Atg14 has some direct or indirect link to inflammasome activation. However, understanding the complex rheostat between the Atg14-mediated autophagy and inflammation regulatory axis will necessitate future studies employing sophisticated models, such as combined knockout mice where ATG14 is deleted alongside key inflammatory regulators (e.g., NLRP3, GSDMD, or CASPASE-1). These dual knockout models could provide crucial insights into how ATG14 modulates inflammatory pathways.

      References:

      (1) Herrera, G.G.B., et al., Oviductal Retention of Embryos in Female Mice Lacking Estrogen Receptor alpha in the Isthmus and the Uterus. Endocrinology, 2020. 161(2).

      (2) Soyal, S.M., et al., Cre-mediated recombination in cell lineages that express the progesterone receptor. Genesis, 2005. 41(2): p. 58-66.

      (3) Zhang, Y., et al., A transgenic FOXJ1-Cre system for gene inactivation in ciliated epithelial cells. Am J Respir Cell Mol Biol, 2007. 36(5): p. 515-9.

      (4) Mizushima, N., T. Yoshimori, and B. Levine, Methods in mammalian autophagy research. Cell, 2010. 140(3): p. 313-26.

      (5) Vaher, H., Expanding the knowledge of tumour necrosis factor-alpha-induced gasdermin E-mediated pyroptosis in psoriasis. Br J Dermatol, 2024. 191(3): p. 319-320.

      (6) Liu, C., et al., CXCR4-BTK axis mediate pyroptosis and lipid peroxidation in early brain injury after subarachnoid hemorrhage via NLRP3 inflammasome and NF-kappaB pathway. Redox Biol, 2023. 68: p. 102960.

    1. eLife Assessment

      This important study offers insights into the function and connectivity patterns of a relatively unknown afferent input from the endopiriform to the CA1 subfield of the ventral hippocampus, suggesting a neural mechanism that suppresses the processing of familiar stimuli in favor of detecting memory guided novelty. The strength of evidence is solid, with careful anatomical and electrophysiological circuit characterization. The work will be of broad interest to researchers studying the neural circuitry of behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The anatomical connectivity of the claustrum and the role of its output projections has, thus far, not been studied in detail. The aim of this study was to map the outputs of the endopiriform (EN) region of the claustrum complex, and understand their functional role. Here the authors have combined sophisticated intersectional viral tracing techniques, and ex vivo electrophysiology to map the neural circuitry of EN outputs to vCA1, and shown that optogenetic inhibition of the EN→vCA1 projection impairs both social and object recognition memory. Interestingly the authors find that the EN neurons target inhibitory interneurons providing a mechanism for feedforward inhibition of vCA1.

      Strengths:

      The strength of this study was the application of a multilevel analysis approach combining a number of state-of-the-art techniques to dissect the contribution of the EN→vCA1 to memory function.

      In addition the authors conducted behavioural analysis of locomotor activity, anxiety and fear memory, and complemented the analysis of discrimination with more detailed description of the patterns of exploratory behaviour.

    3. Reviewer #2 (Public review):

      Summary:

      Yamawaki et al., conducted a series of neuroanatomical tracing and whole cell recording experiments to elucidate and characterise a relatively unknown pathway between the endopiriform (EN) and CA1 of the ventral hippocampus (vCA1) and to assess its functional role in social and object recognition using fibre photometry and dual vector chemogenetics. The main findings were that the EN sends robust projections to the vCA1 that collateralise to the prefrontal cortex, lateral entorhinal cortex and piriform cortex, and these EN projection neurons terminate in the stratum lacunosum-moleculare (SLM) layer of distal vCA1, synapsing onto GABAergic neurons that span across the Pyramidal-Stratum Radiatum (SR) and SR-SML borders. It was also demonstrated that EN input disynaptically inhibits vCA1 pyramidal neurons. vCA1 projecting EN neurons receive afferent input from piriform cortex, and from within EN. Finally, fibre photometry experiments revealed that vCA1 projecting EN neurons are most active when mice explore novel objects or conspecifics, and pathway-specific chemogenetic inhibition led to an impairment in the ability to discriminate between novel vs. familiar objects and conspecifics.

      Revision 1:<br /> The authors have addressed most of my concerns, but a few weaknesses remain :

      (1) I expected to see the addition of raw interaction times with objects and conspecifics for each phase of social testing (pre-test, sociability test, social discrimination), as per my comment on including raw data. However, the authors only provided total distance traveled and velocity, and total interaction time in Figure S9, which is less informative.

      (2) The authors observed increased activity in vCA1-projecting EN neurons tracking with the preferred object during the pre-test (object-object exploration) phase of the social tests, and the summary schematic (Figure 9A) depicts animals as showing a preference for one object over the other (although they are identical) in both the social and object recognition tests. However, in the chemogenetic experiment, the data (Fig S9B) indicate that animals did not show this preference for one object over another, making the expected baseline for this task unclear. This also raises an important question of whether the lack of effect from chemogenetic inhibition of vCA1-projecting EN neurons could be attributed to the absence of this baseline preference.<br /> Additionally, the finding that vCA1-projecting EN activity is associated with the preferred object exploration appears to counter the authors' argument that novelty engages this circuit (since both objects are novel in this instance). This discrepancy warrants further discussion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The anatomical connectivity of the claustrum and the role of its output projections has, thus far, not been studied in detail. The aim of this study was to map the outputs of the endopiriform (EN) region of the claustrum complex, and understand their functional role. Here the authors have combined sophisticated intersectional viral tracing techniques, and ex vivo electrophysiology to map the neural circuitry of EN outputs to vCA1, and shown that optogenetic inhibition of the EN→vCA1 projection impairs both social and object recognition memory. Interestingly the authors find that the EN neurons target inhibitory interneurons providing a mechanism for feedforward inhibition of vCA1.

      Strengths:

      The strength of this study was the application of a multilevel analysis approach combining a number of state-of-the-art techniques to dissect the contribution of the EN→vCA1 to memory function.

      Weaknesses:

      Some authors would disagree that the vCA1 represents a 'node for recognition of familiarity' especially for object recognition although that is not to say that it might play some role in discrimination, as shown by the authors. I note however that the references provided in the Introduction, concerning the role of vCA1 in memory refer to anxiety, social memory, temporal order memory, and not novel object recognition memory. Given the additional projections to the piriform cortex shown in the results, I wonder to what extent the observations may be explained by odour recognition effects.

      We have added references demonstrating that the ventral hippocampus contributes to object recognition memory in rodents (Broadbent NJ et al., Learn Mem 2010; Titulaer J et al., Front Behav Neurosci 2021).

      The odor recognition effect is an interesting perspective that we have also considered. However, in our object recognition test, the same odor (70% EtOH) was used for both objects, yet the mice were able to discriminate between the familiar and novel objects. This suggests that the likelihood of the odor cue contributing to their performance in object discrimination test is low.

      In addition, I wondered whether the impairments in discrimination following Chemogenetic inhibition of the EN→vCA1 were due to the subject treating the novel and familiar stimuli as either both novel- which might be observed as an increase in exploration, or both stimuli as familiar, with a decrease in overall exploration.

      We thank the reviewer for rising this interesting point. We analyzed the total exploration time (i.e., time in interaction zones in familiar and novel) during social discrimination test. The data is added to Fig. S9. Total exploration time was not affected by CNO treatment. This indicates inhibition of ENvCA1-proj. neurons reduced interaction time with the novel conspecific and increased interaction time with the familiar conspecific. The subject mice seem to give even weight on familiar and novel stimuli.

      Reviewer #2 (Public Review):

      Summary:

      Yamawaki et al., conducted a series of neuroanatomical tracing and whole-cell recording experiments to elucidate and characterise a relatively unknown pathway between the endopiriform (EN) and CA1 of the ventral hippocampus (vCA1) and to assess its functional role in social and object recognition using fibre photometry and dual vector chemogenetics. The main findings were that the EN sends robust projections to the vCA1 that colateralise to the prefrontal cortex, lateral entorhinal cortex, and piriform cortex, and these EN projection neurons terminate in the stratum lacunosum-moleculare (SLM) layer of distal vCA1, synapsing onto GABAergic neurons that span across the Pyramidal-Stratum Radiatum (SR) and SR-SML borders. It was also demonstrated that EN input disynaptically inhibits vCA1 pyramidal neurons. vCA1 projecting EN neurons receive afferent input from the piriform cortex, and from within EN. Finally, fibre photometry experiments revealed that vCA1 projecting EN neurons are most active when mice explore novel objects or conspecifics, and pathway-specific chemogenetic inhibition led to an impairment in the ability to discriminate between novel vs. familiar objects and conspecifics.

      This is an interesting mechanistic study that provides valuable insights into the function and connectivity patterns of afferent input from the endopiriform to the CA1 subfield of the ventral hippocampus. The authors propose that the EN input to the vCA1 interneurons provides a feedforward inhibition mechanism by which novelty detection could be promoted. The experiments appear to be carefully conducted, and the methodological approaches used are sound. The conclusions of the paper are supported by the data presented on the whole.

      We thank the reviewer for their positive comments on our work.

      The authors used dual retrograde tracing and observed that the highest percentage (~30%) of vCA1 projecting EN cells also projected to the PFC. They then employed an intersectional approach to show the presence of collaterals in other cortical areas such as the entorhinal cortex and piriform cortex in addition to the PFC. However, they state that 'Projection to prefrontal cortex was sparse relative to other areas, as expected based on the retrograde labeling data' (referring to Figure 2K) and subsequently appear to dismiss the initial data set indicating strong axonal projections to the PFC.

      Our interpretation is that 70% of the ENCA1-proj. population does not send collaterals to the PFC, suggesting that the PFC is not a major target for this population (unlike vCA1 where 100% of its population projects). This hypothesis is supported by our axon branching study, which showed lower axon density in the PFC compared to vCA1 (and other regions). We revised the text to 'much sparser relative to that of vCA1' (line 101) to facilitate a direct comparison with the retrograde and anterograde labeling study.

      Since this is a relatively unknown connection, it would be helpful if some evidence/discussion is provided for whether the EN projects to other subfields (CA3, DG) of the ventral hippocampus. This is important, as the retrograde tracer injections depicted in Figure 1B clearly show a spread of the tracer to vCA3 and potentially vDG and it is not possible to ascertain the regional specificity of the pathway.

      We addressed the potential caveat associated with the retrograde tracer injection, as mentioned by the reviewer, by performing intersectional axon branching analysis. This analysis demonstrated that EN axons are primarily located in the SLM of the distal CA1 subfield (Figs. 2, 3, S2). However, we occasionally observed very weak labeling in the CA3 or dentate gyrus. We modified our text (lines 106-108) and figure (Fig. S2D) to account for this.

      The vCA1 projecting EN cells appear to originate from an extensive range along the AP axis. Is there a topographical organization of these neurons within the vCA1? A detailed mapping of this kind would be valuable.

      This is an interesting question for future research. Our data show a non-uniform distribution of this cell type, suggesting the potential for topographic organization.

      Given this extensive range in the location of vCA1 EN originating cells, how were the targets (along the AP axis) in EP selected for the calcium imaging?

      Using our injection coordinates, ENvCA1-proj. neurons were consistently labeled at high density just posterior to the bregma (Fig. 1J). Therefore, we targeted this region for our imaging.

      The vCA1 has extensive reciprocal connections with the piriform cortex as well, which is in close proximity to the EN. How certain are the authors that the chemogenetic targeting was specific to the EN-vCA1 connection?

      We performed histology on every animal used in the behavioral study to examine the specificity of hM4D expression, and only included those with specific labeling in the EN.

      Raw data for the sociability and discrimination indices should be provided so that the readers can gain further insight into the nature of the impairment.

      The raw data for total interaction time during the social discrimination test has been added (Fig. S9F).

      Line 222: It is unclear how locomotor activity informs anxiety in the behavioral tests.

      The degree of exploratory behavior in a novel context is generally considered to infer anxiety levels in rodents. We have added a review paper (Ref 44, Prut, 2003) that discusses this point.

      Figure 7 title; It is stated that activity of EN neurons 'predict' social/object discrimination performance. However, caution must be exercised with this interpretation as the correlational data are underpowered (n=5-8). Furthermore, the results show a significant correlation between calcium event ratios and the discrimination index in the social discrimination test but not the object discrimination test.

      We added the sample size for EN calcium imaging during the object recognition memory test (Fig. 7G). The updated data indicate a significant correlation between EN activity and the object recognition index (N = 9, Pearson R = 0.8, p = 0.01).

      We have changed the title of Figure 7 to 'Activity of ENvCA1-proj. neurons correlates with social/object discrimination performance’.

      While both male and female mice were included in the anatomical tracing and recording experiments, only male mice were used for behavioral tests.

      The female behavior was highly inconsistent in the control condition of our social recognition memory paradigm; therefore, we decided to conduct the study with males. We will design a new behavioral paradigm for future studies to address this challenge.

      Reviewer #1 (Recommendations For The Authors):

      (1) It is not clear how the relative number of vCA1 projecting neurons in Figure 1H was acquired, not enough detail is presented in the methods section. To what extent could these data have been affected by differences in the size or anatomical position of the injection site in vCA1, which judging from the example fluorescent image in Figure 1B also appears to include CA3.

      We used AMaSiNe (Song et al. 2020) to semi-automatically quantify fluorescently labeled presynaptic neurons. This open-source software identifies the number and location of these cells across different regions based on the Allen Mouse Brain Common Framework. To control for transfection variability (e.g., due to slight differences in injection volume or site), we normalized the presynaptic cell count in each region by the total number of cells in regions of interest. We performed for N = 5 brain and found consistent trend as seen in Fig. 1H (grey lines).

      We have added the detailed method of quantification in the Materials and Methods section (line 393).

      (2) For a number of the results, the full statistical values are not presented in the Results section or figure legend.

      We have included the full statistical values in the figure legends of the revised manuscript.

      (3) It is not clear how much virus was injected in the different experiments (tract racing, electrophysiology, behaviour, etc.). The methods state 50-100ul, but there is no further detail in the results or figure legends.

      We have included the injected volumes of the virus in the revised manuscript.

      (4) Figure 2 mentions the CLA complex (line 702) but this is not defined in the text. Although the introduction does refer to the claustrum complex, there is no acronym.

      We have corrected the manuscript accordingly.

      (5) Line 131- 'we recorded from 3-4 GABAergic neurons' - presumably this is in each animal?

      We recorded 3 to 4 GABAergic neurons sequentially from the same slice to compare input strength. We have edited the text to clarify this (line 134).

      Reviewer #2 (Recommendations For The Authors):

      Figure 3C: It is not clear what the dashed lines labelled proximal and distal represent.

      It is the proximal and distal vCA1 regions where GFP signals were measured for Fig. 3D. We have modified the figure legend to clarify this (line 736).

      Figure 5D: what do the different colors represent? Different colors for one brain?

      I assume that the reviewer meant to refer to Fig. 4D instead of Fig. 5D. In Fig. 4D, one color indicates starter cells in one brain. To clarify this, we have edited the figure legend (line 748).

      Figure S6E: The images are low resolution and it is hard to decipher the exact locations of labeled neurons. Please provide more guidance (e/g/. labeling areas of interest).

      We have added reference lines and labels in Figure S6E.

      Some details are missing: what was the volume of AAV injected for each site/experiment; how was CNO made, and where was it purchased from?

      We have added this information (lines 330-331; 431-434).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work presents a replicable difference in predictive processing between subjects with and without tinnitus. In two independent MEG studies and using a passive listening paradigm, the authors identify an enhanced prediction score in tinnitus subjects compared to control subjects. In the second study, individuals with and without tinnitus were carefully matched for hearing levels (next to age and sex), increasing the probability that the identified differences could truly be attributed to the presence of tinnitus. Results from the first study could successfully be replicated in the second, although the effect size was notably smaller.

      Throughout the manuscript, the authors provide a thoughtful interpretation of their key findings and offer several interesting directions for future studies. Their conclusions are fully supported by their findings. Moreover, the authors are sufficiently aware of the inherent limitations of cross-sectional studies.

      Strengths:

      The robustness of the identified differences in prediction scores between individuals with and without tinnitus is remarkable, especially as successful replication studies are rare in the tinnitus field. Moreover, the authors provide several plausible explanations for the decline of the effect size observed in the second study.

      The rigorous matching for hearing loss, in addition to age and sex, in the second study is an important strength. This ensures that the identified differences cannot be attributed to differences in hearing levels between the groups.

      The used methodology is explained clearly and in detail, ensuring that the used paradigms may be employed by other researchers in future studies. Moreover, the registering of the data collection and analysis methods for Study 2 as a Registered Report should be commended, as the authors have clearly adhered to the methods as registered.

      Weaknesses:

      Although the authors have been careful to match their experimental groups for age, sex, and hearing loss, there are other factors that may confound the current results. For example, subjects with tinnitus might present with psychological comorbidities such as anxiety and depression. The authors' exclusion of distress as a candidate for explaining the found effects is based solely on an assessment of tinnitus-related distress, while it is currently not possible to exclude the effects of elevated anxiety or depression levels on the results. Additionally, as the authors address in the discussion, the presence of hyperacusis may also play a role in predictive processing in this population.

      The authors write that sound intensity was individually determined by presenting a short audio sequence to the participants and adjusting the loudness according to an individual pleasant volume. Neural measurements made during listening paradigms might be influenced by sound intensity levels. The intensity levels chosen by the participants might therefore also have an effect on the outcomes. The authors currently do not provide information on the sound intensity levels in the experimental groups, making it impossible to assess whether sound intensity levels might have played a role.

      Thank you very much for your favorable and constructive evaluation of our manuscript. We agree with you on various additional confounds that we did not consider and included a section in our discussion. It is also correct that we did not include the sound intensity levels in our analysis, which is also a potential confound. Unfortunately, we do not have the data on the individual sound intensity levels but we included a section regarding this issue in our discussion as well.

      Line 937-949:

      “In both studies, tinnitus distress was not correlated with the reported prediction effects. Nevertheless, tinnitus can also be characterized by other features such as its loudness, pitch or duration which were not included in the experimental assessment. Additionally, we solely used a short version of the Mini-TQ (Goebel and Hiller, 1992) in Study 2, which did not allow us to relate prediction scores to subscales like sleep disturbances which potentially influence cognitive functioning and thus predictive processing. Next to sleeping disorders and distress, tinnitus is often also accompanied by psychological comorbidities such as depression or anxiety (Langguth, 2011) which are potential confounds of the results. For the work described in this manuscript the replicability of the core finding was of main importance. More studies are needed taking into account to assess relate the prediction patterns in more detail to aspects of tinnitus sensation and distress.”

      Reviewer #2 (Public Review):  

      Summary:  

      This study aimed to test experimentally a theoretical framework that aims to explain the perception of tinnitus, i.e., the perception of a phantom sound in the absence of external stimuli, through differences in auditory predictive coding patterns. To this aim, the researchers compared the neural activity preceding and following the perception of a sound using MEG in two different studies. The sounds could be highly predictable or random, depending on the experimental condition. They revealed that individuals with tinnitus and controls had different anticipatory predictions. This finding is a major step in characterizing the top-down mechanisms underlying sound perception in individuals with tinnitus.

      Strengths:  

      This article uses an elegant, well-constructed paradigm to assess the neural dynamics underlying auditory prediction. The findings presented in the first experiment were partially replicated in the second experiment, which included 80 participants. This large number of participants for an MEG study ensures very good statistical power and a strong level of evidence. The authors used advanced analysis techniques - Multivariate Pattern Analysis (MVPA) and classifier weights projection - to determine the neural patterns underlying the anticipation and perception of a sound for individuals with or without tinnitus. The authors evidenced different auditory prediction patterns associated with tinnitus. Overall, the conclusions of this paper are well supported, and the limitations of the study are clearly addressed and discussed.  

      Weaknesses:  

      Even though the authors took care of matching the participants in age and sex, the control could be more precise. Tinnitus is associated with various comorbidities, such as hearing loss, anxiety, depression, or sleep disorders. The authors assessed individuals' hearing thresholds with a pure tone audiogram, but they did not take into account the high frequencies (6 kHz to 16 kHz) in the patient/control matching. Moreover, other hearing dysfunctions, such as speech-in-noise deficits or hyperacusis, could have been taken into account to reinforce their claim that the observed predictive pattern was not linked to hearing deficits. Mental health and sleep disorders could also have been considered more precisely, as they were accounted for only indirectly with the score of the 10-item mini-TQ questionnaire evaluating tinnitus distress. Lastly, testing the links between the individuals' scores in auditory prediction and tinnitus characteristics, such as pitch, loudness, duration, and occurrence (how often it is perceived during the day), would have been highly informative.

      Thank you very much for your careful and constructive evaluation. We agree with the weaknesses stated in our manuscript and aimed to highlight these aspects more in our analyses and discussion, so future studies can take them into account (see e.g., line 937949). 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      I would strongly recommend the inclusion of data on the used sound intensity levels. It would be very useful to assess whether there are any group differences regarding sound intensity of the stimuli, to exclude any effects of sound intensity on the results.

      We agree with you that - next to experimental aspects like the stimulus frequencies and the number of trials - the sound intensity levels potentially influence the effects as well. Unfortunately, this data was not saved during the experimental procedure and we are not able to include this as a variable in our analyses. As we, however, acknowledge this issue and want to provide guidelines for future research, we added a section to our discussion targeting sound intensity levels. 

      Line 902-913:

      “Thirdly, both studies used individual sound intensity levels to ensure a comfortable listening situation for the participants. These differences in sound intensity levels are, however, a potential confound in the experimental design as well since sound intensity can have an impact on neural responses (Thaerig et al., 2008). Although in this design, we expect the intensity levels balanced equally to the hearing loss of the participants (which did not differ between groups), and basic decoding of sound frequency did not differ in both studies, we are not able to ultimately exclude the sound intensity level as a driver of our effects. Future studies should include a perceived loudness matching for each frequency and should compare the adapted sound intensity values between each group or integrate them into the analysis (e.g., using the logistic regression approach in Fig. 8).”

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      Introduction

      • The authors wrote: "Overall, this situation calls for the pursuit of alternative or complementary models that place less emphasis on the hearing status of the individual." They clearly demonstrated that the altered-gain model focuses on hearing loss and does not overcome the three described limitations. However, they mentioned other models focusing on brain activity outside of the auditive pathway (noise cancellation, map reorganization, specific neural networks. The authors should better explain the novelty of their approach compared to the existing ones.

      Thank you for your input. The inconclusive results and open questions about the altered-gain framework let us search for a different theoretical foundation for this work. We agree with you, that there are other models such as the map reorganization theory or neural network models next to the altered gain model and recent literature showed results supporting these frameworks (see e.g., a review from our group discussing tinnitus research in MEG over the last 10 years, Reisinger et al. (2023)). Nevertheless, as we focus on prediction processes, the Bayesian inference framework in tinnitus (Sedley et al., 2016) fits best for our approach. As we stated in line 113-116 “The Bayesian inference framework could, therefore, explain the experience of tinnitus in lieu of any increase in neural activity in the auditory system, or indicate an additional alteration, on top of hearing loss, for tinnitus to be perceived”, this framework differs from the other models and demonstrate a novel approach in tinnitus research. The novelty in this work is our methodological approach, which allows for explicit analyses of predictive patterns, irrespective of the exact location in the brain. This is a first step towards our actual underlying question whether aberrant auditory prediction patterns act as a neural correlate of tinnitus or rather as a risk factor or disposition. In our opinion, this question is of crucial relevance for understanding tinnitus processes on a neural level and our robust effects highlight the necessity to investigate these predictive processes in a longitudinal manner. We included a paragraph in our manuscript to make this more apparent for the reader. 

      Line 128-137:

      “We utilized a powerful, recently established experimental approach (Demarchi et al., 2019) showing anticipatory activations of tonotopically specific auditory templates for regular tone sequences. This method allows us to explicitly investigate predictive patterns in line with the Bayesian inference framework (Sedley et al., 2016), leading towards the overall question whether alterations in predictive coding can be interpreted as a neural correlate of tinnitus or rather as a risk factor. Since this question can solely be targeted in a longitudinal manner, we aimed in a first step to investigate prediction patterns in tinnitus over two independent samples, deriving robust effects that should be considered in future research.”

      • "This conceptual model bridges several explanatory gaps: for example, the inconsistent findings in humans regarding the "altered gain" view which states enhanced neural activity in the auditory pathway". What are "the inconsistent findings in humans regarding the 'altered gain'"? It would be helpful if the authors were more explicit about their idea here and added reference(s) to support it.

      Thank you for pointing that out. We agree with you that this section lacks clarity and we aimed to be more precise. 

      Line 108-116:

      “This conceptual model bridges several explanatory gaps: for example, the inconsistent findings in humans regarding the “altered gain” view which states altered neural activity in the auditory pathway. Recent findings vary in both the targeted frequency bands and the direction of the reported power changes which impede consistent conclusions (Eggermont and Roberts, 2015; Elgohyen et al., 2015, Reisinger et al., 2023). The Bayesian inference framework could, therefore, explain the experience of tinnitus in lieu of any increase in neural activity in the auditory system, or indicate an additional alteration, on top of hearing loss, for tinnitus to be perceived.”

      • I suggest moving this part to the discussion:

      "However, alternative explanations cannot be excluded with certainty, such as tinnitus being the cause of altered prediction tendencies or that there is a third variable being responsible for predictions and tinnitus development. Furthermore, even if altered predictive tendencies were to be found, there could be various possibilities of exactly how they could be altered to contribute to the onset or persistence of tinnitus. Some further clarity might then be gained through longitudinal studies in humans or animals."

      Thank you for your suggestion, we moved this part to the corresponding section in the discussion.

      Line 742-756:

      “Distinct predictive processing patterns could e.g., either develop within an individual in contributing to chronification of tinnitus (e.g., shift of “default prediction” from silence to sound; Sedley, 2019). Alternatively, they could be conceived as sensory processing style, making certain individuals more vulnerable to develop tinnitus under certain conditions (e.g., hearing loss, aging), a notion reminiscent of the “strong prior” hypothesis of hallucinations (Corlett et al., 2019). Hence, the direction of the effect remains unclear and alternative explanations, such as a third variable being responsible for predictions and tinnitus development, cannot be excluded with certainty. Furthermore, even if altered predictive tendencies were to be found, there could be various possibilities of exactly how they could be altered to contribute to the onset or persistence of tinnitus. In any case, any more conclusive claims would require longitudinal data, ideally with a tinnitus-free baseline. As such research is challenging to implement, especially in humans, we first focused in this work on finding cross-sectional group differences between individuals with and without tinnitus.”

      Methods

      Participants

      • "We calculated the individual mean hearing ability based on the values for 500, 1000, 2000, and 4000 Hz, which is a common approach for averaging results of pure-tone audiometry". Even if this method has been used multiple times in the literature, I would not recommend it as it can hide differences. Hearing loss is usually larger at high frequencies (starting at 6 000 Hz). An average threshold calculated with those central frequencies is more relevant for clinical use than in research. I strongly recommend performing a linear model with the factors Frequency (including all tested frequencies), Group, Ear side, and their interactions to precisely test the group differences in hearing thresholds.

      Thank you for pointing that out. We agree with you that higher frequencies are of potential interest as well when analyzing hearing loss. We included your suggested linear model in our methods section and the results were in line with our assumption that the groups did not differ substantially. Additionally, we included another logistic regression model in our exploratory analyses when investigating the influence of hearing loss on the prediction scores. Once more, the addition of higher frequencies did not substantially influence the effects.

      Line 194-203:

      “We calculated the individual mean hearing ability based on the values for 500, 1000, 2000, and 4000 Hz, which is a common approach for averaging results of pure-tone audiometry (i.e., PTA-4, see for example Lin et al. (2011); Ozdek et al. (2010)). Using independent t-tests, we found no differences in hearing status over frequencies between groups for the left(t=-1.19, p=.238) and right ear (t=-1.72, p=.09). An additional linear regression including all frequencies from 125 Hz to 8000 Hz also showed that hearing thresholds did not differ between ears (b=0.311, SE=1.600, p=.846) and groups (b=1.702, SE=1.553, p=.273), but solely between frequencies (b=0.003, SE=0.000, p<.001). Interactions were not significant as well.”

      Line 712-725:

      “As these logistic regression models were computed using an average hearing score computed over the frequencies 500, 1000, 2000, and 4000 Hz (i.e., PTA-4, see for example Lin et al. (2011); Ozdek et al. (2010)), we questioned whether hearing loss in higher frequencies influenced our effects. We therefore computed an additional logistic regression including also the PTA values of 6000 and 8000 Hz. In this analysis, hearing loss was not a significant predictor of tinnitus but rather showed a trend with b\=0.211, SE\=0.111, p\=.062. Prediction scores, however, remained a significant predictor of tinnitus even after including high-frequency hearing loss (b\=0.232, SE\=0.111, p\=.040). In this analysis, odds ratios indicated an increase of 26% in the odds of having tinnitus with a one standard deviation increase in the prediction score. Overall, this analysis strongly supports the notion that the main effect genuinely reflects a process related to the experience or statistical risk of experiencing tinnitus.”

      Stimuli and experimental procedure

      • Can you explain the use of movies during sound listening? And not an active listening task with oddball events, for example, to ensure that the subject attention is directed to the sounds?

      Thank you for your comment. We agree with you that attention is a relevant factor and with our design we cannot exclude potential attention effects on our findings. We chose this paradigm since previous research in our group including this exact experimental design (Demarchi et al., 2019) impressively demonstrated the formation of feature-specific auditory predictions in the brain and we aimed to investigate to what extent this can be detected in the tinnitus brain.

      We acknowledged this issue in our discussion (see line 916-919): “In the current work, we used passive listening tasks including a movie to reduce attentional focus on the presented stimuli. Therefore, we cannot draw conclusions whether differences in attention had an influence on the effects. Future studies should include more manipulations of attention to investigate its relevance”. 

      Results

      Pre-stimulus effects are not related to hearing loss and tinnitus-related features

      • How was the hearing loss calculated for this analysis? I recommend a PCA on the hearing levels, to get individual scores with a data-driven approach. Usually, the first dimension will be an average of all the frequencies. The second should be a difference between low and high frequencies. The same comment applies to study 2.

      Thank you for pointing that out. In the first study, participant groups were not controlled for hearing loss and pure-tone audiograms were solely averaged over all frequencies and both ears. As we marked out throughout the manuscript, insufficient control for hearing loss was the key issue in study 1 which led to the implementation of study 2. Further, we do not have data about the hearing status of every participant in study 1 and we do therefore not believe that a more complex approach for calculating hearing loss will increase interpretability in study 1. Nevertheless, we agree with you that it is not apparent how hearing loss was calculated in study 1. The results of the pure-tone audiometry were averaged over all frequencies and both ears, but no cut-off values were defined to characterize hearing loss. We therefore highly appreciate your detailed revision of our manuscript and adjusted the phrasing in the corresponding section. With our approach, it is not justifiable to talk about hearing loss but rather hearing thresholds. As for study 2, the methodological approach was reviewed and accepted as a Registered Report and we therefore do not want to deviate drastically from our pre-registered approach.

      Line 162-165:

      “Standardized pure-tone audiometric testing for frequencies from 125Hz to 8kHz was performed in 31 out of 34 tinnitus participants using Interacoustic AS608 audiometer.

      Averages were computed over all frequencies and both ears.”

      Line 356-362:

      “In the whole sample of participants with tinnitus (n=34) we performed a Spearman correlation of the β-coefficient values corresponding to the time-point of the maximum and the minimum t-value in intergroup analysis (comprised of positive and negative significant clusters emerging in group comparison for sound trials) with hearing thresholds (averaged audiogram for both ears), tinnitus loudness (10-point scale) and tinnitus distress scores (TQ).”

      Line 463-464:

      See as well Line 471-481.

      Line 491-495:

      “Our main findings are: 1) basic processing of carrier frequencies are not altered in tinnitus; 2) with increasing regularity of the sequence, individuals with tinnitus show relatively enhanced predictions of frequency information; 3) the effect is not related to hearing thresholds and tinnitus distress or loudness in this sample.”

      • In the methods, the authors indicated that the volume was adjusted individually at a pleasant volume. Can authors test if the volume was related to the individual's accuracy? Did they test that all frequencies were audible for all participants?

      Thank you for your feedback. We agree with you that it would be interesting to see whether sound intensity levels were related to the accuracy. Unfortunately, data regarding the volume was not saved during the experimental procedure and we are not able to include this as a variable in our analyses. We acknowledge this issue and added a section to our discussion targeting sound intensity levels. As for the second question, the individual volume adjustment was also meant to guarantee that all frequencies were audible for the participant. We clarified this in the methods section. Overall, it is important to mention that we did not find any differences between groups in the decoding of random tones (see Fig. 2 and Fig. 6C), indicating that the volume did not substantially have an influence on one group compared to the other.

      Line 232-234:

      “Sound intensity was individually determined by presenting a short audio sequence to the participants and adjusting the loudness according to an individual pleasant volume with all four frequencies audible for the participant.”

      Line 902-913:

      “Thirdly, both studies used individual sound intensity levels to ensure a comfortable listening situation for the participants. These differences in sound intensity levels are, however, a potential confound in the experimental design as well since sound intensity can have an impact on neural responses (Thaerig et al., 2008). Although in this design, we expect the intensity levels balanced equally to the hearing loss of the participants (which did not differ between groups), and basic decoding of sound frequency did not differ in both studies, we are not able to ultimately exclude the sound intensity level as a driver of our effects. Future studies should include a perceived loudness matching for each frequency and should compare the adapted sound intensity values between each group or integrate them into the analysis (e.g., using the logistic regression approach in Fig. 8).”

      Pre-stimulus differences in ordered and random tone sequences are not related to tinnitus distress • Accuracy was not correlated with tinnitus distress. Could the authors test if the accuracy was related to other clinical data, such as tinnitus pitch, duration, and loudness? And at the subscales of the mini-TQ?

      We appreciate your constructive feedback and agree with you that other tinnitus features such as pitch, duration, or loudness are also interesting in this regard. Unfortunately, these features were not assessed in study 2 and we are therefore not able to provide this information. Additionally, we solely used a short version of the Mini-TQ in this study and did not assess all subscales but rather used all available items for calculating tinnitus distress. This is a limitation of our study design and we included it in the discussion.

      Line 937-949:

      “In both studies, tinnitus distress was not correlated with the reported prediction effects. Nevertheless, tinnitus can also be characterized by other features such as its loudness, pitch or duration which were not included in the experimental assessment. Additionally, we solely used a short version of the Mini-TQ (Goebel and Hiller, 1992) in Study 2, which did not allow us to relate prediction scores to subscales like sleep disturbances which potentially influence cognitive functioning and thus predictive processing. [...] More studies are needed taking into account to assess relate the prediction patterns in more detail to aspects of tinnitus sensation and distress.”

      The strength of group effects differs between the two studies

      • This section should be in the discussion, not the results

      Thank you for your valuable input. In this section, we show comparisons between the two studies and report Bayes factors over time for the differences in decoding accuracy (see Figure 7A). We introduce novel results and believe therefore that this section should remain in the results and is discussed later in the manuscript.  

      Discussion

      • Globally, the discussion is very long and a bit speculative. I recommend the authors shorten the discussion (especially the speculations), and delete the repetition.

      Thank you very much for your constructive feedback. We aimed to shorten our discussion and delete repetitions to increase clarity and readability.

      • The effect of hearing loss has been tested in this study, evaluated as the mean hearing threshold of 4 central frequencies. However, hearing abilities cannot be limited to a central audiogram. High frequencies, speech-in-noise abilities, or other hidden hearing loss can be impacted, even for individuals without hearing loss on 500Hz- 4000Hz. The conclusion on the prediction effect being independent of hearing loss should include this limitation.

      Thank you for pointing that out. We added this limitation to the discussion.

      Line 781-794:

      “In a complementary analysis, we used our prediction score in addition to hearing loss magnitudes as predictors of tinnitus in a logistic regression. Prediction related pre-activation levels were informative whether participants perceived tinnitus, also when statistically controlling for hearing loss. However, it has to be mentioned that we calculated hearing loss based on the PTA results of the frequencies between 500 and 4000 Hz. This does not reflect hearing impairments like high frequency hearing loss or hidden hearing loss (i.e., hearing difficulties despite a normal audiogram, Liberman (2015)). As for hidden hearing loss, we were not able to draw conclusions regarding our effects since this concept of hearing damage is difficult to measure objectively, especially in humans. However, we included an additional logistic regression expanding the frequency range up to 8000 Hz and again, hearing loss did not substantially impact the prediction score as an informative tinnitus predictor.”

      Line 712-723:

      “As these logistic regression models were computed using an average hearing score computed over the frequencies 500, 1000, 2000, and 4000 Hz (i.e., PTA-4, see for example Lin et al. (2011); Ozdek et al. (2010)), we questioned whether hearing loss in higher frequencies influenced our effects. We therefore computed an additional logistic regression including also the PTA values of 6000 and 8000 Hz. In this analysis, hearing loss was not a significant predictor of tinnitus but rather showed a trend with b\=0.211, SE\=0.111, p\=.062. Prediction scores, however, remained a significant predictor of tinnitus even after including high-frequency hearing loss (b\=0.232, SE\=0.111, p\=.040). In this analysis, odds ratios indicated an increase of 26% in the odds of having tinnitus with a one standard deviation increase in the prediction score.”

      • "An increased focus on hippocampal regions, e.g., in fMRI, patient, or animal studies, could be a worthwhile complement to our MEG work, given the outstanding relevance of medial temporal areas in the formation of associations in statistical learning paradigms (see e.g., Covington et al., (2018); Schapiro et al., (2016)).".

      in the opinion of this reviewer, this claim is not well introduced and should be removed.

      Thank you for pointing that out. In our opinion, an increased focus on hippocampal regions is an important consideration for future research and we decided to keep this part in the manuscript. However, we added a third reference highlighting the relevance of temporal areas in tinnitus to strengthen our claim. 

      Line 866-868:

      “... given the outstanding relevance of medial temporal areas in the formation of associations in statistical learning paradigms (see e.g., Covington et al., (2018); Paquette et al., (2017); Schapiro et al., (2016)).”

      References:

      Paquette, S., Fournier, P., Dupont, S., de Edelenyi, F. S., Galan, P., & Samson, S. (2017). Risk of tinnitus after medial temporal lobe surgery. JAMA neurology, 74(11), 1376-1377. https://doi.org/10.1001/jamaneurol.2017.2718.

      • "Overall, our work clearly underlines the true presence of differences, in terms of predictive processing, between individuals with and without tinnitus. At the same time, distinct design choices impact the strength of the effects which is not only apparent in the present work but was also reported recently by Yukhnovich and colleagues (2024). Further to controlling for basic variables (age, sex, hearing loss), future studies using our paradigm and analysis approach should opt for a broad frequency spacing (>2 octaves) and ideally more than 2000 trials per carrier frequency in the random sequence. These recommendations are likely even more important for efforts of testing this paradigm using EEG, which normally comes with inferior data quality as compared to MEG."

      This reviewer considers that the entire paragraph should be deleted, as the effects are already covered in the previous paragraph.

      Thank you very much for your feedback, however, we believe that this paragraph acts as a brief and accurate summary for our guidelines to improve future research in this field. This section therefore remained in the manuscript.

      Minor comments

      Introduction

      • "The onsets of tinnitus and hearing loss often do not occur at the same time ". This sentence should have a reference.

      We appreciate your careful evaluation of our manuscript and included a reference to the sentence pointing out hearing loss as a precursor of tinnitus.

      Line 95f.:

      “2) The onsets of tinnitus and hearing loss often do not occur at the same time (Roberts et al., 2010).” 

      Methods

      Participants

      • Participants' laterality needs to be mentioned.

      Thank you for your input. We agree with you that laterality is an interesting aspect that should be taken into account. Unfortunately, however, we did not assess this in the current design. We mentioned the lack of this information in the methods section.

      Line 158:

      “Laterality of the participants was not assessed.”

      176-177:

      “No participants with psychiatric or neurological diseases were included in the sample. Laterality of the participants was not assessed.”

      "Four individuals with tinnitus did not show any audiometric abnormality; four of the participants showed unilateral hearing impairments; 26 volunteers had high-frequency hearing loss; and six individuals were hearing impaired over most frequencies (i.e. hearing thresholds higher than 30 dB)."

      This part is not precise enough. "Unilateral hearing impairment": is it on one or multiple frequencies? "26 volunteers had high-frequency hearing loss". What is considered as highfrequency here? The precision "(i.e. hearing thresholds higher than 30 dB)" can be dropped as it was defined in the sentence just before.

      We appreciate your constructive feedback and added information to clarify the audiometric characteristics of our participants.

      Line 186-190:

      “Four individuals with tinnitus did not show any audiometric abnormality; four of the participants showed unilateral hearing impairments on at least one frequency; 26 volunteers had high-frequency hearing loss (i.e. hearing thresholds higher than 30 dB); and six individuals were hearing impaired over most frequencies (i.e. hearing thresholds higher than 30 dB).”

      Results

      • Figure 3C: are those group differences significant? It should be noted on the graphs.

      • Figure 6D: I would suggest to remove this figure, as the correlation is not significant.

      • Figure 7A: It would be useful to precise the number of trials for each study, in parenthesis.

      • Figure 8 is unnecessary.

      Thank you for your careful assessment of our figures. We agree with you that significance should be indicated in Figure 3C and that the precise number of trials is relevant information in Figure 7A. We corrected the figures accordingly. However, the Figures 6D and 8 remained in the manuscript since they were already part of our Registered Report and we do not want to remove graphical information that was reviewed and accepted already.

    2. eLife Assessment

      This important work presents two studies on predictive processes in subjects with and without tinnitus. The evidence supporting the authors' claims is compelling, as their second study serves as an independent replication of the first. Rigorous matching between study groups was performed, especially in the second study, increasing the probability that the identified differences in predictive processing can truly be attributed to the presence of tinnitus. This work will be of interest to researchers, especially neuroscientists, in the tinnitus field.

    3. Reviewer #2 (Public review):

      Summary:

      This study aimed to test experimentally a theoretical framework that aims to explain the perception of tinnitus, i.e., the perception of a phantom sound in the absence of external stimuli, through differences in auditory predictive coding patterns. To this aim, the researchers compared the neural activity preceding and following the perception of a sound using MEG in two different studies. The sounds could be highly predictable or random, depending on the experimental condition. They revealed that individuals with tinnitus and controls had different anticipatory predictions. This finding is a major step in characterizing the top-down mechanisms underlying sound perception in individuals with tinnitus.

      Strengths:

      This article uses an elegant, well-constructed paradigm to assess the neural dynamics underlying auditory prediction. The findings presented in the first experiment were partially replicated in the second experiment, which included 80 participants. This large number of participants for an MEG study ensures very good statistical power and a strong level of evidence. The authors used advanced analysis techniques - Multivariate Pattern Analysis (MVPA) and classifier weights projection - to determine the neural patterns underlying the anticipation and perception of a sound for individuals with or without tinnitus. The authors evidenced different auditory prediction patterns associated with tinnitus. Overall, the conclusions of this paper are well supported, and the limitations of the study are clearly addressed and discussed.

      Weaknesses:

      Even though the authors took care of matching the participants in age and sex, the control could be more precise. Tinnitus is associated with various comorbidities, such as hearing loss, anxiety, depression, or sleep disorders. The authors assessed individuals' hearing thresholds with a pure tone audiogram, but they did not take into account the high frequencies (6 kHz to 16 kHz) in the patient/control matching. Moreover, other hearing dysfunctions, such as speech-in-noise deficits or hyperacusis, could have been taken into account to reinforce their claim that the observed predictive pattern was not linked to hearing deficits. Mental health and sleep disorders could also have been considered more precisely, as they were accounted for only indirectly with the score of the 10-item mini-TQ questionnaire evaluating tinnitus distress. Lastly, testing the links between the individuals' scores in auditory prediction and tinnitus characteristics, such as pitch, loudness, duration, and occurrence (how often it is perceived during the day), would have been highly informative.

      Comments on revisions:

      Thank you for your responses. There are a few remaining points that, if addressed, could further enhance the manuscript:

      - While the manuscript acknowledges the limitation of not matching groups on hearing thresholds in Study 1, a deeper analysis of participants' hearing abilities and their impact on MEG results, similar to that conducted in Study 2, would be valuable. Specifically, including a linear model that considers all frequencies, group membership, and their interactions could highlight differences across groups. Additionally, examining the effect of high-frequency hearing loss on prediction scores, as performed in Study 2, would strengthen the analysis, particularly given the trend noted (line 719). Such an addition could make a significant contribution to the literature by exploring how hearing abilities may influence prediction patterns.

      - The connection with the hippocampal regions (line 864) remains somewhat unclear. While the inclusion of the Paquette reference appropriately links temporal region activity with tinnitus, it does not fully support the statement: "An increased focus on hippocampal regions, e.g., in fMRI, patient, or animal studies, could be a worthwhile complement to our MEG work, given the outstanding relevance of medial temporal areas in the formation of associations in statistical learning paradigms"

      - Authors should add a comparison of participants mini-TQ scores on both studies<br /> - Authors should add significant level on Fig 6.B as in Fig 3.C, and a n.s on Fig 6.D

    1. eLife Assessment

      This is a potentially important study on interpretation of protein coding genetic variation in CDKN2A. The presentation of the data has improved, revealing that the experimental design is flawed and concerns that the data that are not robust enough to support the major claim of supporting clinical variant interpretation for CDKN2A. This work, while incomplete, will serve as a resource for diagnostic labs as well as cancer geneticists.

    2. Reviewer #1 (Public review):

      Summary:

      Kimura et al performed a saturation mutagenesis study of CDKN2A to assess functionality of all possible missense variants and compare them to previously identified pathogenic variants. They also compared their assay result with those from in silico predictors.

      Strengths:

      CDKN2A is an important gene that modulate cell cycle and apoptosis, therefore it is critical to accurately assess functionality of missense variants. Overall, the paper reads well and touches upon major discoveries in a logical manner.

      Weaknesses:

      The paper lacks proper details for experiments and basic data, leaving the results less convincing. Analyses are superficial and does not provide variant-level resolution. Many of which were addressed during the revision process.

      Comments on revisions

      The manuscript was improved during the revision process.

    3. Reviewer #2 (Public review):

      Summary:

      This study describes a deep mutational scan across CDKN2A using suppression of cell proliferation in pancreatic adenocarcinoma cells as a readout for CDKN2A function. The results are also compared to in silico variant predictors currently utilized by the current diagnostic frameworks to gauge these predictors' performance. The authors also functionally classify CDKN2A somatic mutations in cancers across different tissues

      Review:

      The goal of this paper was to perform functional classification of missense mutations in CDKN2A in order to generate a resource to aid in clinical interpretation of CDKN2A genetic variants identified in clinical sequencing. In our initial review, we concluded that this paper was difficult to review because there was a lack of primary data and experimental detail. The authors have significantly improved the clarity, methodological detail and data exposition in this revision, facilitating a fuller scientific review. Based on the data provided we do not think the functional characterization of CDKN2A variants is robust or complete enough to meet the stated goal of aiding clinical variant interpretation. We think the underlying assay could be used for this purpose but different experimental design choices and more replication would be required for these data to be useful. Alternatively, the authors could also focus on novel CDKN2A variants as there seems to be potential gain of function mutations that are simply lumped into "neutral" that may have important biological implications.

      Major concerns:

      Low experimental concordance. The p-value scatter plot (Figure 2 Figure Supplement 3A) across 560 variants shows low collinearity indicating poor replicability. These data should be shown in log2fold changes, but even after model fitting with the gamma GLM still show low concordance which casts strong doubt on the function scores.<br /> The more detailed methods provided indicate that the growth suppression experiment is done in 156 pools with each pool consisting of the 20 variants corresponding to one of the 156 aa positions in CKDN2A. There are several serious problems with this design.

      Batch effects in each of the pools preventing comparison across different residues. We think this is a serious design flaw and not standard for how these deep mutational scans are done. The standard would be to combine all 156 pools in a single experiment. Given the sequencing strategy of dividing up CDKN2A into 3 segments, the 156 pools could easily have been collapsed into 3 (1 to 53, 54 to 110, 111 to 156). This would significantly minimize variation in handling between variants at each residue and would be more manageable for performance of further replicates of the screen for reproducibility purposes. The huge variation in confluency time 16-40 days for each pool suggest that this batch effect is a strong source of variation in the experiment

      Lack of experimental/biological replication: The functional assay was only performed once on all 156 CDKN2A residues and was repeated for only 28 out of 156 residues, with only ~80% concordance in functional classification between the first and second screens. This is not sufficiently robust for variant interpretation. Why was the experiment not performed more than once for most aa sites?

      For the screen, the methods section states that PANC-1 cells were infected at MOI=1 while the standard is an MOI of 0.3-0.5 to minimize multiple variants integrating into a single cell. At an MOI =1 under a Poisson process which captures viral integration, ~25% of cells would have more than 1 lentiviral integrant. So in 25% of the cells the effect of a variant would be confounded by one or more other variants adding noise to the assay.

      While the authors provide more explanation of the gamma GLM, we strongly advise that the heatmap and replicate correlations be shown with the log2 fold changes rather than the fit output of the p-values.

      In this study, the authors only classify variants into the categories "neutral", "indeterminate", or "deleterious" but they do not address CDKN2A gain-of-function variants that may lead to decreased proliferation. For example, there is no discussion on variants at residue 104, whose proliferation values mostly consist of higher magnitude negative log2fold change values. These variants are defined as neutral but from the one replicate of the experiment performed, they appear to be potential gain-of-function variants.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kimura et al performed a saturation mutagenesis study of CDKN2A to assess the functionality of all possible missense variants and compare them to previously identified pathogenic variants. They also compared their assay result with those from in silico predictors.

      Strengths:

      CDKN2A is an important gene that modulates cell cycle and apoptosis, therefore it is critical to accurately assess the functionality of missense variants. Overall, the paper reads well and touches upon major discoveries in a logical manner.

      Weaknesses:

      The paper lacks proper details for experiments and basic data, leaving the results less convincing. Analyses are superficial and do not provide variant-level resolution.

      We thank the reviewer for their comments. We have updated the manuscript to include additional detail of experimental methods and variant level resolution of data and analyses. We have also conducted additional analyses to compare variant classifications using a gamma generalized linear model and log2 normalized fold change, establish the effect of low variant coverage on variant functional classifications, determine the performance of combining multiple in silico predictions, and determine the prevalence of functionally deleterious variants in gnomAD and functionally deleterious variants of uncertain significance in ClinVar compared all CDKN2A missense variants.

      Reviewer #2 (Public Review):

      This study describes a deep mutational scan across CDKN2A using suppression of cell proliferation in pancreatic adenocarcinoma cells as a readout for CDKN2A function. The results are also compared to in silico variant predictors currently utilized by the current diagnostic frameworks to gauge these predictors' performance. The authors also functionally classify CDKN2A somatic mutations in cancers across different tissues.

      This study is a potentially important contribution to the field of cancer variant interpretation for CDKN2A, but is almost impossible to review because of the severe lack of details regarding the methods and incompleteness of the data provided with the paper. We do believe that the cell proliferation suppression assay is robust and works, but when it comes to the screening of the library of CDKN2A variants the lack of primary data and experimental detail prevents assessment of the scientific merit and experimental rigor.

      We are grateful for the opportunity to clarify our experimental methods and to provide additional data in the revised manuscript. The manuscript has been updated to include, among other changes, additional information on assay design, analysis of variant representation in the library, inclusion of primary data with variant level resolution, and a comparison of variant classifications using a gamma generalized linear model and log2 normalized fold change.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major issues:

      (1) Can the pathogenicity values of individual amino acid changes be opened to the public? It would serve as a valuable asset to the community.

      Thank you for your suggestion. We are happy to provide this information. Individual variant data and functional classifications from the functional assay are given in Appendix 1-table 4.

      (2) In the method section, it is not clear (at least to the reviewer) whether the protocol describing the construction of the CDKN2A missense library was provided.

      Thank you for your comment. We have included additional information in the manuscript describing construction of the CDKN2A missense library.

      “CDKN2A expression plasmid libraries

      Codon-optimized CDKN2A cDNA using p16INK4A amino acid sequence (NP_000068.1), was designed (Appendix 1-table 12) and pLJM1 containing codon optimized CDKN2A (pLJM1-CDKN2A) generated by Twist Bioscience (South San Francisco, CA). 156 plasmid libraries were then synthesized by using pLJM1-CDKN2A, such that each library contained all possible 20 amino acids variants (19 missense and 1 synonymous) at a given position, generating 500 ng of each plasmid library (Twist Bioscience, South San Francisco, CA). The proportion of variant in each library was shown in Appendix 1-table 2. Variants with a representation of less than 1% in a plasmid library were individually generated using the Q5 Site-Directed Mutagenesis kit (New England Biolabs, Ipswich, MA; catalog no. E0552), and added to each library to a calculated proportion of 5%. Primers used for site-directed mutagenesis are given in Appendix 1-table 13. Each library was then amplified to generate at least 5 ug of plasmid DNA using QIAGEN Plasmid Midi Kit (QIAGEN, Germantown, MD; catalog no. 12143).”

      (3) The paper lacks basic experimental results. The results cover almost all possible missense variants, but it would be clearer if actual coverage values used for calculating relative enrichment were shown. Are all variants well covered? Isn't there any spurious signal due to low coverage? How many times were the experiments performed? Also, how many cells were used, what was the expected MOI, and what proportion of harvested cells is thought to have a single variant? How can you distinguish the effect of a single variant from a multiple variants effect?

      We thank the reviewer for their comment. We have provided additional information in the manuscript to address these issues. Briefly, in response to each issue:

      (1) We have provided read count data for all variants, used to determine functional classifications based on either gamma generalized linear model or normalized fold change, in Appendix 1-table 4.

      (2) To assess if low variant coverage resulted in spurious signals, we compared prevalence of functionally deleterious classifications among variants binned by coverage in the Day 9 cell pool. We did not identify any statistically significant differences based on variant coverage.

      “We also determined whether underrepresentation in the cell pool at Day 9 affected variant functional classifications. Fifty-three of 2,964 missense variants (1.8%) were present in the cell pool at Day 9 of the first assay replicate (experiment 1) at < 2%, as determined by the number of sequence reads supporting the variant (Figure 2 -figure supplement 4A, Appendix 1-table 4). There was no statistically significant difference in the proportion of variants classified as functionally deleterious for variants present in less than 2% of the cell pool at Day 9 (12 of 53 variants; 22.6%), and variants present in more than 2% of the cell pool (496 of 2,911 variants; 17.0%) (P value = 0.28) (Figure 2 -figure supplement 4B). We also found no significant differences in the proportion of variants classified as functionally deleterious for variants present in more than 2% of the cell pool at Day 9 when variants were binned in 1% intervals (Figure 2 -figure supplement 4B).”

      (3) The assay was repeated in duplicate for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the assay was completed once. We found good agreement between variant classifications in assay repeats. We have added to the text as follows:

      “To confirm the reproducibility of our variant classifications, 28 amino acid residues were assayed in duplicate, and variants classified using the gamma GLM. The majority of missense variants, 452 of 560 (80.7%), had the same functional classification in each of the two replicates (Figure 2 -figure supplement 3A and B, Appendix 1-table 4).”

      We have also added discussion of this study limitation to the manuscript:

      “We repeated our functional assay twice for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the functional assay was completed once. While we found general agreement between functional classifications from each replicate for the 28 residues assayed in duplicate, additional repeats for each residue are necessary to determine variability in variant functional classifications.”

      (4) We have added additional information about the number of cells used for transduction and MOI to the method section:

      “Lentiviral transduction

      PANC-1 cells were used for CDKN2A plasmid library and single variant CDKN2A expression plasmid transductions. PANC-1 cells previously transduced with pLJM1-CDKN2A (PANC-1CDKN2A) and selected with puromycin were used for CellTag library transductions. Briefly, 1 x 105 cells were cultured in media supplemented with 10 ug/ml polybrene and transduced with 4 x 107 transducing units per mL of lentivirus particles. Cells were then centrifuged at 1,200 x g for 1 hour. After 48 hours of culture at 37oC and 5% CO2, transduced cells were selected using 3 µg/ml puromycin (CDKN2A plasmid libraries and single variant CDKN2A expression plasmids) or 5 µg/ml blasticidin (CellTag plasmid library) for 7 days. Expected MOI was one. After selection, cells were trypsinized and 5 x 105 cells were seeded into T150 flasks. DNA was collected from remaining cells and this sample was named as (Day 9). T150 flasks were cultured until confluent and then DNA was collected. The time for cells to become confluent varied for each amino acid residue (Day 16 – 40, Appendix 1-table 5).”

      (5) Our assay was not designed to distinguish multiple variant effects. However, we do not anticipate multiple transductions to significantly impact variant classifications in our assay. We found that our functional classifications were consistent with previously reported classifications:

      “In general, our results were consistent with previously reported classifications. Of variants identified in patients with cancer and previously reported to be functionally deleterious in published literature and/or reported in ClinVar as pathogenic or likely pathogenic (benchmark pathogenic variants), 27 of 32 (84.4%) were functionally deleterious in our assay (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4) (Chaffee et al., 2018; Chang et al., 2016; Horn et al., 2021; Hu et al., 2018; Kimura et al., 2022; McWilliams et al., 2018; Roberts et al., 2016; Zhen et al., 2015). Five benchmark pathogenic variants were characterized as indeterminate function, with log2 P values from -19.3 to -33.2. Of 156 synonymous variants and six missense variants previously reported to be functionally neutral in published literature and/or reported in ClinVar as benign or likely benign (benchmark benign variants), all were characterized as functionally neutral in our assay (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4) (Kimura et al., 2022; McWilliams et al., 2018; Roberts et al., 2016). Of 31 VUSs previously reported to be functionally deleterious, 28 (90.3%) were functionally deleterious and 3 (9.7%) were of indeterminate function in our assay. Similarly, of 18 VUSs previously reported to be functionally neutral, 16 (88.9%) were functionally neutral and 2 (11.1%) were of indeterminate function in our assay, (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4).”

      (4) Comparison of functional classifications (shown in Figure 3) from this study and other in silico tools is superficial. The analysis is based on the presumption that their result is gold-standard, thereby calculating the sensitivity, accuracy, and PPV of individual predictors. But apparently, this won't be true, so it would be more reasonable to check the "correlation" of the study results and other predictors: e.g. which variants show consistent results between this study and other predictors? Are there any indicators of consistent vs inconsistent results? How does the consistency change by protein sequences or domains? Etc

      Thank you for your comment. We have added additional analysis to our manuscript comparing our functional classifications with in silico variant effect predictions. Specifically, we have included analysis combining multiple predictors:

      “We also tested the effect of combining multiple in silico predictors. 904 missense variants had in silico predictions from all 7 algorithms. The remaining 2,060 missense variants had in silico predictions from 5 algorithms. Of variants with in silico predictions from all 7 algorithms, 378 (41.8%) had predictions of deleterious or pathogenic effect from a majority of algorithms (≥ 4), and of these, 137 (36.2%) were functionally deleterious in our assay. Similarly, of 2,060 missense variants that had in silico predictions from 5 algorithms, 1107 (53.7%) had predictions of deleterious or pathogenic effect from a majority of algorithms (≥ 3), of which, 361 (32.6%) were functionally deleterious in our assay (Appendix 1-table 7).”

      (5) Similarly, Figure 4 does not deliver much information, either. Rather than delivering a simple summary, it would be more informative if deeper analyses were conducted. e.g., do pathogenic variants show higher frequency among patients, or higher variant frequency in tumors (if data were available).

      We have included additional analysis of somatic alterations in the manuscript. We found pathogenic/likely pathogenic somatic mutations were enriched in patients. This was also the case for somatic mutations that were classified as functionally deleterious in our assay. We also found statistically significant depletion of functionally deleterious mutations in colorectal adenocarcinoma. Interestingly, no patients with a somatic mutation in a mismatch repair gene had a functionally deleterious CDKN2A missense somatic mutation. However, this observation was not statistically significant. Future studies will determine whether CDKN2A and MMR gene somatic mutations are mutually exclusive in colorectal adenocarcinoma.

      “We found that 34.2% - 53.4% of unique missense somatic mutations classified as functionally deleterious, with 61.4% - 67.6% of patients having a functionally deleterious somatic mutation (Figure 4A, Appendix 1-table 9). As with functionally deleterious variants, functionally deleterious missense somatic mutations were also not distributed evenly across CDKN2A, being enriched within the ankyrin repeat 3 (Figure 4B, Appendix 1-table 9). We found that 32.4% - 50.0% of all functionally deleterious missense somatic mutations occurred within ankyrin repeat 3, with 48.0% - 58.0% of patients in each cohort having a functionally deleterious missense somatic mutation in this domain. Notably, 65.7% - 76.0% of functionally deleterious missense somatic mutations in this domain were in residues 80-89 (Appendix 1-table 9).”

      “We were also able to determine the functional classification of CDKN2A missense somatic mutations in COSMIC, TCGA, JHU, and MSK-IMAPCT by cancer type. We found that 22.2% - 100% of CDKN2A missense somatic mutations were functionally deleterious depending on cancer type (Figure 4-figure supplement 2A-D). When considering missense somatic mutation reported in any database, there was a statistically significant depletion of functionally deleterious mutations in colorectal adenocarcinoma (20.4%; adjusted P value = 5.4 x 10-9) (Figure 4C). As the proportion of missense somatic mutations that were functionally deleterious was less in colorectal carcinoma compared to other types of cancer, we assessed whether somatic mutations in mismatch repair genes (MLH1, MLH3, MSH2, MSH6, PMS1, and PMS2) were associated with the functional status of CDKN2A missense somatic mutations. Thirty-five patients in COSMIC had a CDKN2A missense somatic mutation, of which 12 (34.3%) had a somatic mutation in a mismatch repair gene. We found that no patients with a somatic mutation in a mismatch repair gene had a functionally deleterious CDKN2A missense somatic mutation compared to 6 of 23 samples (26.1%) without a somatic mutation in a mismatch repair gene (P value = 0.062).”

      (6) It would be helpful to validate the neutral variants set. Are variants of UK biobank or gnomAD enriched on neutral population? Are synonymous variants exclusively found in neutral populations?

      Thank you for the suggestion. All synonymous variants were found to functionally neutral in our assay. We also assessed VUSs from gnomAD and found a lower prevalence of functionally deleterious variants compared to all CDKN2A variants and CDKN2A missense somatic mutations:

      “The Genome Aggregation Database (gnomAD) v4.1.0 reports 287 missense variants in CDKN2A, including the 13 pathogenic, 4 likely pathogenic, 3 likely benign, 3 benign, and 264 VUSs classified using ACMG variant interpretation guidelines (Figure 5A, Figure 5B, and Appendix 1-table 10). Of the 264 missense VUSs, 177 were functionally neutral (67.0%), 56 (21.2%) were indeterminate function, and 31 (11.7%) were functionally deleterious in our assay using the gamma GLM for classification (Figure 5C).”

      (7) They used a pancreatic cancer cell line and assayed for cell proliferation. The limitations of this method and the possibility of complementing the limitations should be discussed.

      Thank you for the suggestion. We have added discussion of this limitation to our manuscript:

      “We characterized variants based upon a broad cellular phenotype, cell proliferation, in a single PDAC cell line. It is possible that CDKN2A variant functional classifications are cell-specific and assay-specific. Our assay may not encompass all cellular functions of CDKN2A and an alternative assay of a specific CDKN2A function, such as CDK4 binding, may result in different variant functional classifications. Furthermore, CDKN2A variants may have different effects if alternative cell lines are used for the functional assay. However, cell-specific effects appear to be limited. In our previous study, we characterized 29 CDKN2A VUSs in three PDAC cell lines, using cell proliferation and cell cycle assays, and found agreement between all functional classifications (Kimura et al., 2022).”

      Minor issues:

      (1) Figures 2B, C: it would be more intuitive to plot significance by logging p-values than raw p-values.

      We used log2 P value (or log2 normalized fold change) for figures in the manuscript as appropriate.

      (2) Figure 2D: annotate protein domain information at the side. Supplementary Figure 2 shows the domains but it would be more informative to show it in Figure 2D heatmap.

      Thank you for the suggestion, we have annotated protein domain information on the left side of the heatmap in (the now) Figure 2C.

      Reviewer #2 (Recommendations For The Authors):

      Major Concerns:

      (1) How many replicates of the screen were performed? It seems like only one library infection/ proliferation assay was done. If so this is insufficient to obtain any idea of the uncertainty of measurement for each variant.

      The assay was repeated in duplicate for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the assay was completed once. We found good agreement between variant classifications in assay repeats. We have added to the text as follows:

      “To confirm the reproducibility of our variant classifications, 28 amino acid residues were assayed in duplicate, and variants classified using the gamma GLM. The majority of missense variants, 452 of 560 (80.7%), had the same functional classification in each of the two replicates (Figure 2 -figure supplement 3A and B, Appendix 1-table 4).”

      We have also added discussion of this study limitation to the manuscript:

      “We repeated our functional assay twice for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the functional assay was completed once. While we found general agreement between functional classifications from each replicate for the 28 residues assayed in duplicate, additional repeats for each residue are necessary to determine variability in variant functional classifications.”

      (2) The count data from the experiment and NGS pipeline to call variants need to be provided for each replication (i.e. the counts that were fed into the gamma model)

      Accompanying this should be information about the depth of sequencing of the cells, the number of cells infected with the library, and standard metrics for pooled screens.

      Quality metrics regarding the representation and completeness of the TWIST library need to be provided. See Brenan et al. Cell Reports (2016) Supplemental Figure 1

      Thank you for your suggestion. We are happy to provide this additional information. Sequence read counts for each variant are given in Appendix 1-table 4. We have provided addition detail in the methods section on functional assay, including number of cells infected with each library:

      “Lentiviral transduction

      PANC-1 cells were used for CDKN2A plasmid library and single variant CDKN2A expression plasmid transductions. PANC-1 cells previously transduced with pLJM1-CDKN2A (PANC-1CDKN2A) and selected with puromycin were used for CellTag library transductions. Briefly, 1 x 105 cells were cultured in media supplemented with 10 ug/ml polybrene and transduced with 4 x 107 transducing units per mL of lentivirus particles. Cells were then centrifuged at 1,200 x g for 1 hour. After 48 hours of culture at 37oC and 5% CO2, transduced cells were selected using 3 µg/ml puromycin (CDKN2A plasmid libraries and single variant CDKN2A expression plasmids) or 5 µg/ml blasticidin (CellTag plasmid library) for 7 days. Expected MOI was one. After selection, cells were trypsinized and 5 x 105 cells were seeded into T150 flasks. DNA was collected from remaining cells and this sample was named as (Day 9). T150 flasks were cultured until confluent and then DNA was collected. The time for cells to become confluent varied for each amino acid residue (Day 16 – 40, Appendix 1-table 5). DNA was extracted from PANC-1 cells using the PureLink Genomic DNA Mini Kit (Invitrogen, Carlsbad, CA; catalog no. K1820-01). The assay for CellTag library was repeated in triplicate. We repeated our CDKN2A assay in duplicate for 28 residues. For the remaining 128 CDKN2A residues the assay was completed once.”

      We have also provided additional information on the TWIST library:

      “CDKN2A expression plasmid libraries

      Codon-optimized CDKN2A cDNA using p16INK4A amino acid sequence (NP_000068.1), was designed (Appendix 1-table 12) and pLJM1 containing codon optimized CDKN2A (pLJM1-CDKN2A) generated by Twist Bioscience (South San Francisco, CA). 156 plasmid libraries were then synthesized by using pLJM1-CDKN2A, such that each library contained all possible 20 amino acids variants (19 missense and 1 synonymous) at a given position, generating 500 ng of each plasmid library (Twist Bioscience, South San Francisco, CA). The proportion of variant in each library was shown in Appendix 1-table 2. Variants with a representation of less than 1% in a plasmid library were individually generated using the Q5 Site-Directed Mutagenesis kit (New England Biolabs, Ipswich, MA; catalog no. E0552), and added to each library to a calculated proportion of 5%. Primers used for site-directed mutagenesis are given in Appendix 1-table 13. Each library was then amplified to generate at least 5 ug of plasmid DNA using QIAGEN Plasmid Midi Kit (QIAGEN, Germantown, MD; catalog no. 12143).”

      (3) It is unclear when barcode abundance is assessed in the cell proliferation assay/in the screen. The exact timepoints of "before and after in vitro culture" (line 91) need to be clarified in the text.

      We are happy to clarify. We collected DNA on Day 9 post transfection and at confluency. Day of confluency for each residue is detailed in Appendix 1-table 5. The text of the manuscript has been updated appropriately.

      (4) Is "before" day 9, as detailed in Figure 1 source data 1? If so, it is misleading to state that the experiment is in culture for 14 days but call day 9 "before... in vitro culture."

      The "before" sample should be obtained immediately after viral infection and selection with the library to provide a representation of library representation.

      We apologize for your confusion. We have clarified in the text and figures that our baseline measurement was at Day 9 post transfection. We also determined whether the proportion of each variant is maintained in the Day 9 cell pool compared to the amplified plasmid library for three CDKN2A amino acid residues (p.R24, p.H66, and p.A127) and updated the manuscript text:

      “To confirm that the representation of each variant was maintained after transduction, we transduced three lentiviral libraries (amino acid residues p.R24, p.H66, and p.A127) individually into PANC-1 cells and determined the proportion of each variant in the amplified plasmid library and in the cell pool at Day 9 post-transduction. The proportion of each variant in the amplified plasmid library and in the cell pool at Day 9 were highly correlated (Figure 1 -figure supplement 2C and D, Appendix 1-table 3).”

      (5) There is no information regarding the function of each variant, aside from just a p-value resulting from the final analysis with the gamma model. Some variants may cause loss of function, others may be neutral while others may be gain of function. Simply providing a p-value is not sufficient. The standard in the field is to provide a function score/ test-statistic giving the sign and magnitude of the effect. For proliferation assays at least a ratio of fold-change of (mut/ synonymous)[day 14] vs (mut/synonymous)[baseline] should be provided.

      Thank you for your comment. We have provided read counts, P values, and functional classifications for each variant using the gamma GLM in Appendix 1-table 4. We have also analyzed variants using log2 normalized fold change. This data is presented in the text and compared to our classifications with the gamma GLM. We have provided normalized fold change and resulting classification for each variant in Appendix 1-table 6.

      (6) A plot of the distribution of function scores for all variants is needed. This will serve as an effective visual to distinguish the control variants from those that are functionally deleterious or benign/neutral (see Findlay et al. Nature (2018) Figure 3A for an example visual).

      Thank you for your suggestion. We have provided additional figures to visualize distribution of assay outputs using the gamma GLM in Figure 2 -figure supplement 1.

      (7) Synonymous variants are used as a proxy for WT per variant library, but do all the synonymous variants truly behave like WT CDKN2A in their ability to suppress cell proliferation? A plot of the distribution of synonymous variant function relative to WT CDKN2A function would be effective here.

      All 156 synonymous variants suppressed cell proliferation and were classified as functionally neutral in our assay using the gamma GLM. The manuscript has been updated to reflect this:

      “Of 156 synonymous variants and six missense variants previously reported to be functionally neutral in published literature and/or reported in ClinVar as benign or likely benign (benchmark benign variants), all were characterized as functionally neutral in our assay (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4)”

      (8) The gamma generalized linear model is not commonly used to analyze the results of saturation mutagenesis screens. Please provide a justification for the use of this analysis method vs using log fold change as other dms scan studies have done (PMID: 27760319, PMID: 30224644).

      Thank you for this important suggestion. We are happy to provide additional information. We used a gamma GLM to functionally characterize CDKN2A variants as it does not rely on an annotated set of pathogenic and benign variants to determine classification thresholds. Instead, classification thresholds are determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro growth. As a gamma GLM is not commonly used for saturation mutagenesis screens, as noted by the reviewer, we also classified variants using log2 normalized fold change. We compared variant functional classifications using the gamma GLM and log2 normalized fold change and in general we found agreement between both methods with 98.5% of missense variants classified as functionally deleterious using a gamma GLM, similarly classified using log2 normalized fold change. We have updated the text to reflect this reasoning and additional analysis.

      (9) The statistical methods used to calculate enrichment of deleterious variants per region of CDKN2A (Figure 2 supplement 1B; lines 163-168) are not described anywhere in the paper. Additionally, the same statistical analysis is not applied to the variants in the subregions near the ankyrin repeats (lines 168-172).

      We are happy to clarify and have added text to the methods section:

      “Z-tests with multiple test correction performed with the Bonferroni method was used in the following comparisons: 1) proportion of functionally deleterious variants present in < 2% of the cell pool and ≥ 2% of the cell pool at Day 9 binned in 1% intervals, 2) proportion of variants in each domain predicted to have deleterious or pathogenic effect by the majority of algorithms, 3) proportion of functionally deleterious variants in each domain, and 4) proportion of functionally deleterious missense variants and somatic mutations.”

      Minor:

      (1) Please review the manuscript for spelling and grammatical errors.

      Sure.

    1. eLife Assessment

      This study presents an important application of high-content image-based morphological profiling to quantitatively and systematically characterize induced pluripotent stem cell-derived mixed neural cultures cell type compositions. Compelling evidence through rigorous experimental and computational validations support new potential applications of this cheap and simple assay.

    2. Joint Public Review:

      Automatically identifying single cell types in heterogeneous mixed cell populations holds great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The authors also propose a new nucleocentric phenotyping pipeline, where a convolutional neural network is trained on the nucleus and some margins around it. This nucleocentric approach improves classification performance at high densities because nuclear segmentation is less prone to errors in dense cultures.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of high-content phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of feature-based (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review: 

      Summary: 

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cell-derived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contain sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and to describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single cell types in heterogeneous mixed cell populations hold great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of highcontent phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of featurebased (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

      Comments on latest version:

      I have consulted with Reviewer #3 and both of us were impressed by revised manuscript, especially by the clear and convincing evidence regarding the nucleocentric model use of the nuclear periphery and its benefit for the case of dense cultures. However, there are two issues that are incompletely addressed (see below). Until these are resolved, the "strength of evidence" was elevated to "compelling".

      First, the analysis of the patch size is not clearly indicating that the 12-18um range is a critical factor (Fig. 4E). On the contrary, the performance seems to be not very sensitive to the patch size, which is actually a desired property for a method. Still, Fig. 4B convincingly shows that the nucleocentric model is not sensitive to the culture density, while the other models are. Thus, the authors can adjust their text saying that the nucleocentric approach is not sensitive to the patch size and that the patch size is selected to capture the nucleus and some margins around it, making it less prone to segmentation errors in dense cultures.

      We agree that there is a significant tolerance to different patch sizes, and have therefore reformulated the conclusion as suggested in the results and the discussion sections (page 10 and 16). As very large patch sizes (>40µm) do increase the variability of the predictions and the imbalance between recall and precision, we have left this observation in the results section, as it also motivates for using smaller patch sizes.  

      Second, the GitHub does not contain sufficient information to reproduce the analysis. Its current state is sparse with documentation that would make reproducing the work difficult. What versions of the software were used? Where should data be downloaded? The README contains references to many different argparse CLI arguments, but sparse details on what these arguments actually are, and which parameters the authors used to perform their analyses. Links to images are broken. Ideally, all of these details would be present, and the authors would include a step-by-step tutorial on how to reproduce their work. Fixing this will lead to an "exceptional" strength of evidence.

      We have added additional information to the GitHub to increase the reproducibility of the analysis.  

      • The README now contains additional documentation and more extensive explanations. A flowchart has been added, making the dataflow and order of analyses more clear.  

      • The accompanying dataset is 20GB in size and can be downloaded as a .zip-file from https://figshare.com/articles/dataset/Nucleocentric-Profiling/27141441?file=49522557. This file contains 2x480 raw images and a layout file.  

      • The used software versions are included in the manuscript in table 4. To increase the reproducibility, a Conda environment file (.yaml) has been added to the GitHub. This can be installed and contains the correct package versions.

      • The README now contains for each script and its arguments a short description on its meaning, on whether it is required or optional and its default setting.  

      • A step-by-step tutorial on the use of the test dataset has been included. This tutorial includes the arguments used to run the code from the command line terminal.

      Recommendations for the authors:

      There are no reference from the text to Fig. 2D and to Fig. 3C.

      This has been changed. The text has been added to the manuscript at page 6 (fig. 2D) and the reference to Fig. 3C has been included at page 8.

    1. eLife Assessment

      This work presents a valuable exploration of AI-assisted protein engineering, particularly in designing a VHH antibody with enhanced resistance and stability to extreme environments. However, the approach is weakened by incomplete support, with computational methods and experimental design appearing somewhat arbitrary and lacking clear justification. Further justification of the chosen methods and clearer exposition would strengthen the study's support and conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the model's capacity to capture epistatic interactions through multi-point mutations and its success in finding the global optimum within the protein fitness landscape highlights the strength of deep learning methods over traditional approaches.

      Strengths:

      It is impressive that the authors used AI combined with limited experimental validation to achieve such significant enhancements in protein performance. Besides, the successful application of the designed antibody in industrial settings demonstrates the practical and economic relevance of the study. Overall, this work has broad implications for future AI-guided protein engineering efforts.

      Weaknesses:

      However, the authors should conduct a more thorough computational analysis to complement their manuscript. While the identification of improved multi-point mutants is commendable, the manuscript lacks a detailed investigation into the mechanisms by which these mutations enhance protein properties. The authors briefly mention that some physicochemical characteristics of the mutants are unusual, but they do not delve into why these mutations result in improved performance. Could computational techniques, such as molecular dynamics simulations, be employed to explore the effects of these mutations? Additionally, the authors claim that their method is efficient. However, the selected VHH is relatively short (<150 AA), resulting in lower computational costs. It remains unclear whether the computational cost of this approach would still be acceptable when designing larger proteins (>1000 AA). Besides, the design process involves a large number of prediction tasks, including the properties of both single-site saturation and multi-point mutants. The computational load is closely tied to the protein length and the number of mutation sites. Could the authors analyze the model's capability boundaries in this regard and discuss how scalable their approach is when dealing with larger proteins or more complex mutation tasks?

    3. Reviewer #2 (Public review):

      In this paper, the authors aim to explore whether an AI model trained on natural protein data can aid in designing proteins that are resistant to extreme environments. While this is an interesting attempt, the study's computational contributions are weak, and the design of the computational experiments appears arbitrary.

      (1) The writing throughout the paper is poor. This leaves the reader confused.

      (2) The main technical issue the authors address is whether AI can identify protein mutations that adapt to extreme environments based solely on natural protein data. However, the introduction could be more concise and focused on the key points to better clarify the significance of this question.

      (3) The authors did not develop a new model but instead used their previously developed Pro-PRIME model. This significantly weakens the novelty and contribution of this work.

      (4) The computational experiments are not well-justified. For instance, the authors used a zero-shot setting for single-point mutation experiments but opted for fine-tuning in multiple-point mutation experiments. There is no clear explanation for this discrepancy. How does the model perform in zero-shot settings for multiple-point mutations? How would fine-tuning affect single-point mutation results? The choice of these strategies seems arbitrary and lacks sufficient discussion.

    4. Author response:

      Reviewer #1:

      Weaknesses:

      However, the authors should conduct a more thorough computational analysis to complement their manuscript. While the identification of improved multi-point mutants is commendable, the manuscript lacks a detailed investigation into the mechanisms by which these mutations enhance protein properties. The authors briefly mention that some physicochemical characteristics of the mutants are unusual, but they do not delve into why these mutations result in improved performance. Could computational techniques, such as molecular dynamics simulations, be employed to explore the effects of these mutations?  Additionally, the authors claim that their method is efficient. However, the selected VHH is relatively short (<150 AA), resulting in lower computational costs. It remains unclear whether the computational cost of this approach would still be acceptable when designing larger proteins (>1000 AA). Besides, the design process involves a large number of prediction tasks, including the properties of both single-site saturation and multi-point mutants. The computational load is closely tied to the protein length and the number of mutation sites. Could the authors analyze the model's capability boundaries in this regard and discuss how scalable their approach is when dealing with larger proteins or more complex mutation tasks?

      We agree that further analysis of the mechanisms by which the identified mutations enhance protein performance would strengthen our study. In the revised manuscript, we plan to conduct molecular dynamics simulations to explore the physicochemical effects of these mutations in more details. This analysis will help elucidate how the observed structural and dynamic changes contribute to the improved resistance and stability of the designed VHH antibody.

      We acknowledge the need to assess the scalability of our method to larger proteins. To address this, we will include an analysis of the method’s performance when applied to longer proteins, including an estimation of computational cost and potential bottlenecks.

      Reviewer #2:

      (1) The writing throughout the paper is poor. This leaves the reader confused.

      (2) The main technical issue the authors address is whether AI can identify protein mutations that adapt to extreme environments based solely on natural protein data. However, the introduction could be more concise and focused on the key points to better clarify the significance of this question.

      (3) The authors did not develop a new model but instead used their previously developed Pro-PRIME model. This significantly weakens the novelty and contribution of this work.

      (4) The computational experiments are not well-justified. For instance, the authors used a zero-shot setting for single-point mutation experiments but opted for fine-tuning in multiple-point mutation experiments. There is no clear explanation for this discrepancy. How does the model perform in zero-shot settings for multiple-point mutations? How would fine-tuning affect single-point mutation results? The choice of these strategies seems arbitrary and lacks sufficient discussion.

      (1&2) We will revise the manuscript to improve the overall clarity and readability. Specifically, we will restructure the introduction to focus more concisely on the key scientific questions and contributions of our study.

      (3) While the Pro-PRIME model was previously developed, this work focuses on designing proteins with properties that do not naturally exist and are scarce in the natural world. To address the concern about novelty, we will expand the discussion to highlight this unique contribution and its implications for advancing protein design.

      (4) We appreciate the comment regarding the discrepancy between the zero-shot and fine-tuning strategies. In the revised manuscript, we will provide a detailed explanation for the choice of these settings, including an analysis of the trade-offs between zero-shot and fine-tuning approaches in multi-point mutation tasks. We will also explore the model’s performance in zero-shot settings for multi-point mutations and report these results in the supplementary materials to ensure completeness.

    1. eLife Assessment

      This study follows up on Arimura et al's powerful new method MagIC-Cryo-EM for imaging native complexes at high resolution. Using a clever design embedding protein spacers between the antibody and the nucleosomes purified, thereby minimizing interference from the beads, the authors concentrate linker histone variant H1.8 containing nucleosomes. From these samples, the authors obtain convincing atomic structures of the H1.8 bound chromatosome purified from interphase and metaphase cells, finding a NPM2 chaperone bound form exists as well. Caveats include the use of formaldehyde crosslinking and tagged H1.8 which might affect the structures obtained; and the NPM2 work could be better incorporated into the main findings. Overall this is an important new tool in the arsenal of single molecule biologists, permitting a deep dive into structure of native complexes. This work will be of high interest to a broad swathe of scientists studying native macromolecules present at low concentrations in cells.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Arimura et al describe MagIC-Cryo-EM, an innovative method for immune-selective concentrating of native molecules and macromolecular complexes for Cryo-EM imaging and single-particle analysis. Typically, Cryo-EM imaging requires much larger concentrations of biomolecules than that are feasible to achieve by conventional biochemical fractionation. Overall, this manuscript is meticulously and clearly written and may become a great asset to other electron microscopists and chromatin researchers.

      Strengths:

      Previously, Arimura et al. (Mol. Cell 2021) isolated from Xenopus extract and resolved by Cryo-EM a sub-class of native nucleosomes conjugated containing histone H1.8 at the on-dyad position, similar to that previously observed by other researchers with reconstituted nucleosomes. Here they sought to analyze immuno-selected nucleosomes aiming to observe specific modes of H1.8 positioning (e.g. on-dyad and off-dyad) and potentially reveal structural motifs responsible for the decreased affinity of H1.8 for the interphase chromatin compared to metaphase chromosomes. The main strength of this work is a clever and novel methodological design, in particular the engineered protein spacers to separate captured nucleosomes from streptavidin beads for a clear imaging. The authors provide a detailed step-by-step description of MagIC-Cryo-EM procedure including nucleosome isolation, preparation of GFP nanobody attached magnetic beads, optimization of the spacer length, concentration of the nucleosomes on graphene grids, data collection and analysis, including their new DUSTER method to filter-out low signal particles. This tour de force methodology should facilitate considering of MagIC-Cryo-EM by other electron microscopists especially for analysis of native nucleosome complexes.<br /> In pursue of biologically important new structures, the immune-selected H1.8-containing nucleosomes were solved at about 4A resolution; their structure appears to be very similar to the previously determined structure of H1.8-reconstituted nucleosomes. There were no apparent differences between the metaphase and interphase complexes suggesting that the on-dyad and off-dyad positioning does not explain the differences in H1.8 - nucleosome binding. However, they were able to identify and solve complexes of H1.8-GFP with histone chaperone NPM2 in a closed and open conformation providing mechanistic insights for H1-NPM2 binding and the reduced affinity of H1.8 to interphase chromatin as compared to metaphase chromosomes.

      Weaknesses:

      Still, I feel that there are certain limitations and potential artifacts resulting from formaldehyde fixation, use of bacterial-expressed recombinant H1.8-GFP, and potential effects of magnetic beads and/or spacer on protein structure, that should be more explicitly discussed. Also, the GFP-pulled down H1.8 nucleosomes should be better characterized biochemically to determine the actual linker DNA lengths (which are known to have a strong effect of linker histone affinity) and presence or absence of other factors such as HMG proteins that may compete with linker histones and cause the multiplicity of nucleosome structural classes (such as shown on Fig. 3F) for which the association with H1.8 is uncertain.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a straightforward and convincing demonstration of a reagent and workflow that they collectively term "MagIC-cryo-EM", in which magnetic nanobeads combined with affinity linkers are used to specifically immobilize and locally concentrate complexes that contain a protein-of-interest. As a proof of concept, they localize, image, and reconstruct H1.8-bound nucleosomes reconstructed from frog egg extracts. The authors additionally devised an image-processing workflow termed "DuSTER", which increases the true positive detections of the partially ordered NPM2 complex. The analysis of the NPM2 complex {plus minus} H1.8 was challenging because only ~60 kDa of protein mass was ordered. Overall, single-particle cryo-EM practitioners should find this study useful.

      Strengths:

      The rationale is very logical and the data are convincing.

      Weaknesses: I have seen an earlier version of this study at a conference. The conference presentation was much easier to follow than the current manuscript. It is as if this manuscript had undergone review at another journal and includes additional experiments to satisfy previous reviewers. Specifically, the NPM2 results don't seem to add much to the main story (MagIC-cryo-EM), and read more like an addendum. The authors could probably publish the NPM2 results separately, which would make the core MagIC results (sans DusTER) easier to read.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Arimura et al report a new method, termed MagIC-Cryo-EM, which refers to the method of using magnetic beads to capture specific proteins out of a lysate via, followed immunoprecipitation and deposition on EM grids. The so-enriched proteins can be analzyed structurally. Importantly, the nanoparticles are further functionalized with protein-based spacers, to avoid a distorted halo around the particles. This is a very elegant approach and allows the resolution of the stucture of small amounts of native proteins at atomistic resolution.<br /> Here, the authors apply this method to study the chromatosome formation from nucleosomes and the oocyte-specific linker histone H1.8. This allows them to resolve H1.8-containing chromatomosomes from oocyte extract in both interphase and metaphase conditions at 4.3 A resolution, which reveal a common structure with H1 placed right at the dyad and contacting both entry-and exit linker DNA.<br /> They then investigate the origin of H1.8 loss during interphase. They identify a non-nucleosomal H1.8-containing complex from interphase preparations. To resolve its structure, the authors develop a protocol (DuSTER) to exclude particles with ambiguous center, revealing particles with five-fold symmetry, that matches the chaperone NPM2. MS and WB confirms that the protein is present in interphase samples but not metaphase. The authors further separate two isoforms, an open and closed form that coexist. Additional densities in the open form suggest that this might be bound H1.8.

      Strengths:

      Together this is an important addition to the suite of cryoEM methods, with broad applications. The authors demonstrate the method using interesting applications, showing that the methods work and they can get high resolution structures from nucleosomes in complex with H1 from native environments.

      Weaknesses:

      The structures of the NPM2 chaperone is less well resolved, and some of the interpretation in this part seems only weakly justified.

    1. eLife Assessment

      This interesting study presents valuable information on how human cytomegalovirus (HCMV) infection disrupts the activity of the TEAD1 transcription factor, leading to widespread chromatin alterations. However, the precise mechanisms underlying this disruption and the extent to which these chromatin changes influence HCMV replication remain unclear. The study is supported by solid evidence, which would be made stronger by including functional analyses. This work will be of interest to virology, chromosome biology and transcriptional co-regulation fields.

    2. Reviewer #1 (Public review):

      The manuscript by Sayeed et al. uses a comprehensive series of multi-omics approaches to demonstrate that late-stage human cytomegalovirus (HCMV) infection leads to a marked disruption of TEAD1 activity, a concomitant loss of TEAD1-DNA interactions, and extensive chromatin remodeling. The data are thoroughly presented and provide evidence for the role of TEAD1 in the cellular response to HCMV infection. However, a key question remains unresolved: is the observed disruption of TEAD1 activity a direct consequence of HCMV infection, or could it be secondary to the broader innate antiviral response? In this respect, the study would benefit from experiments that assess the effect of TEAD1 overexpression or knockdown/deletion on HCMV replication dynamics. Such functional assays could help delineate whether TEAD1 perturbation directly influences viral replication or is part of a downstream/indirect cellular response, providing deeper mechanistic insights.

    3. Reviewer #2 (Public review):

      Summary:

      This work uses genomic and biochemical approaches for HCMV infection in human fibroblasts and retinal epithelial cell lines, followed by comparisons and some validations using strategies such as immunoblots. Based on these analyses, they propose several mechanisms that could contribute to the HCMV-induced diseases, including closing of TEAD1-occupying domains and reduced TEAD1 transcript and protein levels, decreased YAP1 and phospho-YAP1 levels, and exclusion of TEAD1 exon 6.

      Strengths:

      The genomics experiments were done in duplicates and data analyses show good technical reproducibility. Data analyses are performed to show changes at the transcript and chromatin level changes, followed by some Western blot validations.

      Weaknesses:

      This work, at the current stage, is quite correlative since no functional studies are done to show any causal links. For readers who are outside the field, some clarifications of the system and design need to be stated.

    1. eLife Assessment

      This study examines the impact of DNA methylation on CTCF binding in two cancer cell lines. Increased CTCF binding sites are enriched in gene bodies, and associate with nuclear speckles, indicating a potential role in increased transcription. However, the association with nuclear speckles needs to be more diligently demonstrated. Thus the strength of the evidence is considered incomplete. This work would be made more valuable to the community if these claims were buttressed by additional evidence and a deeper discussion of new findings in the light of previous relevant literature. This work will be of interest to the chromosome biology/epigenetics field.

    2. Reviewer #1 (Public review):

      Summary<br /> Roseman et al. use a new inhibitor of the maintenance DNA methyltransferase DNMT1 to probe the role of methylation on binding of the CTCF protein, which is known to be involved chromatin loop formation. As previous reported, and as expected based on our knowledge that CTCF binding is methylation-sensitive, the authors find that loss of methylation leads to additional CTCF binding sites and increased loop formation. By comparing novel loops with the binding of the pre-mRNA splicing factor SON, which localizes to the nuclear speckle compartment, they propose that these reactivated loops localize to near speckles. This behavior is dependent on CTCF whereas degradation of two speckle proteins does not affect CTCF binding or loop formation. The authors propose a model in which DNA methylation controls the association of genome regions with speckles via CTCF-mediated insulation.

      Strengths<br /> The strengths of the study are 1) the use of a new, specific DNMT1 inhibitor and 2) the observation that genes whose expression is sensitive to DNMT1 inhibition and dependent on CTCF (cluster 2) show higher association with SON than genes which are sensitive to DNMT1 inhibition but are CTCF insensitive, is in line with the authors' general model.

      Weaknesses<br /> There are a number of significant weaknesses that as a whole undermine many of the key conclusions, including the overall mechanistic model of a direct regulatory role of DNA methylation on CTCF-mediated speckle association of chromatin loops.

      (1) The authors frequently make quasi-quantitative statements but do not actually provide the quantitative data, which they actually all have in hand. To give a few examples: "reactivated CTCF sites were largely methylated (p. 4/5), "many CTCF binding motifs enriched..." (p.5), "a large subset of reactivated peaks..."(p.5), "increase in strength upon DNMT1 inhibition" (p.5); "a greater total number....." (p.7). These statements are all made based on actual numbers and the authors should mention the numbers in the text to give an impression of the extent of these changes (see below) and to clarify what the qualitative terms like "largely", "many", "large", and "increase" mean. This is an issue throughout the manuscript and not limited to the above examples.<br /> Related to this issue, many of the comparisons which the authors interpret to show differences in behavior seem quite minor. For example, visual inspection suggests that the difference in loop strength shown in figure 1E is something like from 0 to 0.1 for K562 cells and a little less for KCT116 cells. What is a positive control here to give a sense of whether these minor changes are relevant. Another example is on p. 7, where the authors claim that CTCF partners of reactivated peaks tend to engage in a "greater number" of looping partners, but inspection of Figure 2A shows a very minor difference from maybe 7 to 7.5 partners. While a Mann-Whitney test may call this difference significant and give a significant P value, likely due to high sample number, it is questionable that this is a biologically relevant difference.

      (2) The data to support the central claim of localization of reactivated loops to speckles is not overly convincing. The overlap with SON Cut&Tag (figure 2F) is partial at best and although it is better with the publicly available TSA-seq data, the latter is less sensitive than Cut&Tag and more difficult to interpret. It would be helpful to validate these data with FISH experiments to directly demonstrate and measure the association of loops with speckles (see below).

      (3) It is not clear that the authors have indeed disrupted speckles from cells by degrading SON and SRRM2. Speckles contain a large number of proteins and considering their phase separated nature stronger evidence for their complete removal is needed. Note that the data published in ref 58 suffers from the same caveat.

      (4) The authors ascribe a direct regulatory role to DNA methylation in controlling the association of some CTCF-mediated loops to speckles (p. 20). However, an active regulatory role of speckle association has not been demonstrated and the observed data are equally explainable by a more parsimonious model in which DNA methylation regulates gene expression via looping and that the association with speckles is merely an indirect bystander effect of the activated genes because we know that active genes are generally associated with speckles. The proposed mechanism of a regulatory role of DNA methylation in controlling speckle association is not convincingly demonstrated by the data. As a consequence, the title of the paper is also misleading.

      (5) As a minor point, the authors imply on p. 15 that ablation of speckles leads to misregulation of genes by altering transcription. This is not shown as the authors only measure RNA abundance, which may be affected by depletion of constitutive splicing factors, but not transcription. The authors would need to show direct effects on transcription.

    3. Reviewer #2 (Public review):

      Summary:<br /> CTCF is one of the most well-characterized regulators of chromatin architecture in mammals. Given that CTCF is an essential protein, understanding how its binding is regulated is a very active area of research. It has been known for decades that CTCF is sensitive to 5-cystosine DNA methylation (5meC) in certain contexts. Moreover, at genomic imprints and in certain oncogenes, 5meC-mediated CTCF antagonism has very important gene regulatory implications. A number of labs (eg, Schubeler and Stamatoyannopoulos) have assessed the impact of DNA methylation on CTCF binding, but it is important to also interrogate the effect on chromatin organization (ie, looping). Here, Roseman and colleagues used a DNMT1 inhibitor in two established human cancer lines (HCT116 [colon] and K562 [leukemia]), and performed CTCF ChIPseq and HiChIP. They showed that "reactivated" CTCF sites-that is, bound in the absence of 5meC-are enriched in gene bodies, participate in many looping events, and intriguingly, appear associated with nuclear speckles. This last aspect suggests that these reactivated loops might play an important role in increased gene transcription. They showed a number of genes that are upregulated in the DNA hypomethylated state actually require CTCF binding, which is an important result.

      Strengths:<br /> Overall, I found the paper to be succinctly written and the data presented clearly. The relationship between CTCF binding in gene bodies and association with nuclear speckles is an interesting result. Another strong point of the paper was combining DNMT1 inhibition with CTCF degradation.

      Weaknesses:<br /> The most problematic aspect of this paper in my view is the insufficient evidence for the association of "reactivated" CTCF binding sites with nuclear speckles needs to be more diligently demonstrated (see Major Comment). One unfortunate aspect was that this paper neglected to discuss findings from our recent paper, wherein we also performed CTCF HiChIP in a DNA methylation mutant (Monteagudo-Sanchez et al., 2024 PMID: 39180406). It is true, this is a relatively recent publication, although the BioRxiv version has been available since fall 2023. I do not wish to accuse the authors of actively disregarding our study, but I do insist that they refer to it in a revised version. Moreover, there are a number of differences between the studies such that I find them more complementary rather than overlapping. To wit, the species (mouse vs human), the cell type (pluripotent vs human cancer), the use of a CTCF degron, and the conclusions of the paper (we did not make a link with nuclear speckles). Furthermore, we used a constitutive DNMT knockout which is not viable in most cell types (HCT116 cells being an exception), and in the discussion mentioned the advantage of using degron technology:

      "With high-resolution techniques, such as HiChIP or Micro-C (119-121), a degron system can be coupled with an assessment of the cis-regulatory interactome (118). Such techniques could be adapted for DNA methylation degrons (eg, DNMT1) in differentiated cell types in order to gauge the impact of 5meC on the 3D genome."

      The authors here used a DNMT1 inhibitor, which for intents and purposes, is akin to a DNMT1 degron, thus I was happy to see a study employ such a technique. A comparison between the findings from the two studies would strengthen the current manuscript, in addition to being more ethically responsible.

    1. eLife Assessment

      This is an important study that reports the mechanism by which Ankle2 (LEM4 in humans) interacts with and recruits PP2A and the ER protein Vap33 to promote BAF dephosphorylation and mediate nuclear membrane reformation, using Drosophila as their model. Using Ankle2 mutants, they find that the ER protein Vap33 is key for the normal interphase localisation of Ankle2/LEM4 and also impacts on the function of Ankle2/LEM4 during mitosis. The authors use a variety of complementary techniques and provide convincing evidence to support the claims. The conclusions about the subcellular localization of Ankle2 might be incomplete since they are drawn from overexpression experiments.

    2. Reviewer #1 (Public review):

      Summary:

      In organisms with open mitosis, nuclear envelope breakdown at mitotic entry and re-assembly of the nuclear envelope at the end of mitosis are important, highly regulated processes. One key regulator of nuclear envelope re-assembly is the BAF (Barrier-to-Autointegration) protein, which contributes to cross-linking of chromosomes to the nuclear envelope. Crucially, BAF has to be in a dephosphorylated form to carry out this function, and PP2A has been shown to be the phosphatase that dephosphorylates BAF. The Ankle2/LEM4 protein has previously been identified as an important regulator of PP2A in the dephosphorylation of BAF but its precise function is not fully understood, and Li and colleagues set out to investigate the function of Ankle2/LEM4 in both Drosophila flies and Drosophila cell lines.

      Strengths:

      The authors use a combination of biochemical and imaging techniques to understand the biology of Ankle2/LEM4. On the whole, the experiments are well conducted and the results look convincing. A particular strength of this manuscript is that the authors are able to study both cellular phenotypes and organismal effects of their mutants by studying both Drosophila D-mel cells and whole flies.

      The work presented in this manuscript significantly enhances our understanding of how Ankle2/LEM4 supports BAF dephosphorylation at the end of mitosis. Particularly interesting is the finding that Ankle2/LEM4 appears to be a bona fide PP2A regulatory protein in Drosophila, as well as the localisation of Ankle2/LEM4 and how this is influenced by the interaction between Ankle2 and the ER protein Vap33. It would be interesting to see, though, whether these insights are conserved in mammalian cells, e.g. does mammalian Vap33 also interact with LEM4? Is LEM4 also a part of the PP2A holoenzyme complex in mammalian cells?

      Weaknesses:

      This work is certainly impactful but more discussion and comparison of the Drosophila versus mammalian cell system would be helpful. Also, to attract the largest possible readership, the Ankle2 protein should be referred to as Ankle2/LEM4 throughout the paper to make it clear that this is the same molecule.

      A schematic model at the end of the final figure would be very useful to summarise the findings.

    3. Reviewer #2 (Public review):

      The authors first identify Ankle2 as a regulatory subunit and direct interactor of PP2A, showing they interact both in vitro and in vivo to promote BAF dephosphorylation. The Ankyrin domain of Ankle2 is important for the interaction with PP2A. They then show Ankle2 also interacts with the ER protein Vap33 through FFAT motifs and they particularly co-localize during mitosis. The recruitment of Ankle2 to Vap33 is essential to ER and nuclear envelop membrane in telophase while earlier in mitosis, it relies on the C terminus but not the FFAT motifs for recruitments to the nuclear membrane and spindle envelop in early mitosis. The molecular determinants and receptors are currently not known. The authors check the function of the PP2A recruitment to Ankle2/Vap33 in the context of embryos and show this recruitment pathway is functionally important. While the Ankle2/Vap33 interaction is dispensable in adult flies -looking at wing development, the PP2A/Ankle2 interaction is essential for correct wing and fly development. Overall, this is a very complete paper that reveals the molecular mechanism of PP2A recruitment to Ankle2 and studies both the cellular and the physiological effect of this interaction in the context of fly development.

      Strengths:

      The paper is well written and the narrative is well-developed. The figures are of high quality, well-controlled, clearly labelled, and easy to understand. They support the claims made by the authors.

      Weaknesses:

      The study would benefit from being discussed in the context of what is already known on Ankle2 biology in C.elegans and human cells. It is important to highlight the structures shown in the paper are alphafold models, rather than validated structures.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were interested in how Ankle2 regulates nuclear envelope reformation after cell division. Other published manuscripts, including those from the authors, show without a doubt that Ankle2 plays a role in this critical process. However, the mechanism by which Ankle2 functions was unclear. Previous work using worms and humans (Asencio et al., 2012) established that human ANKLE2 could bind endogenous PP2A subunits. The binding was direct and was mediated through a region before and including the first ankyrin repeat in human ANKLE2. In addition to its interaction with PP2A, Asencio et al., 2012 also show that ANKLE2 regulates VRK1 kinase activity. Together PP2A and VRK1 regulate BAF phosphorylation for proper nuclear envelope reformation. Here, the authors provide more evidence for interaction with PP2A by also mapping the domain of interaction to the ankyrin repeat in Drosophila. In addition, the ankyrin repeat is essential for nuclear envelope reformation after division. They show that Ankle2 can bind in a PP2A complex without other known regulatory subunits of PP2A. The authors also identify a novel interaction with ER protein Vap33, but functional relevance for this interaction in nuclear envelope reformation is not provided in the manuscript, which the authors explicitly state. This manuscript does not comment on the activity of Ballchen/VRK1 in relation to Ankle2 loss and BAF phosphorylation or nuclear envelope reformation, even though links were previously shown by multiple studies (Asencio et al., Link et al., Apridita Sebastian et al.,). Nuclear envelope defects were rescued by the reduction of VRK1 in two of these manuscripts. It is possible that BAF phosphorylation phenotypes can be contributed by both PP2A inactivity and VRK1 overactivity due to the loss of Ankle2.

      Strengths:

      This manuscript is a useful finding linking Ankle2 function during nuclear envelope reformation to the PP2A complex. The authors present solid data showing that Ankle2 can form a complex with PP2A-29B and Mts and generate a phosphoproteomic resource that is fundamentally important to understanding Ankle2 biology.

      Weaknesses:

      However, the main findings/conclusions about subcellular localization might be incomplete since they are drawn from overexpression experiments. In addition, throughout the text, some conclusions are overstated or are not supported by data.

    1. eLife Assessment

      This valuable study reports the first characterization of the CG14545 gene in Drosophila melanogaster, which the authors name "Sakura." Acting during germline stem cell fate and differentiation, Sakura is required for both oogenesis and female fertility. The evidence supporting the claims of the authors is solid, but the manuscript would be strengthened by a more in-depth investigation into the cause-and-effect relationships for the different defects observed.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In Sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled.

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

    3. Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (and named it Sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in Sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through the modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. Given Sakura's role in pMad expression, it would be insightful to investigate whether overexpression of Mad or pMad could mitigate these phenotypic defects (UAS-Mad line is available at Bloomington Drosophila Stock Center).

      A major concern is the overstated role of Sakura in regulating Orb. The data does not reveal mislocalized Orb; rather, a mislocalized oocyte and cytoskeletal breakdown, which may be secondary consequences of defects in oocyte polarity and structure rather than direct misregulation of Orb. The conclusion that Sakura is necessary for Orb localization is not supported by the data. Orb still localizes to the oocyte until about stage 6. In the later stage, it looks like the cytoskeleton is broken down and the oocyte is not positioned properly, however, there is still Orb localization in the ~8-stage egg chamber in the oocyte. This phenotype points towards a defect in the transport of Orb and possibly all other factors that need to localize to the oocyte due to cytoskeletal breakdown, not Orb regulation directly. While this result is very interesting it needs further evaluation on the underlying mechanism. For example, the decrease in E-cadherin levels leads to a similar phenotype and Bam is known to regulate E-cadherin expression. Is Bam expressed in these later knockdowns?

      The manuscript would benefit from a more balanced interpretation of the data concerning Sakura's role in Orb regulation. Furthermore, a more expanded discussion on Sakura's potential role in pMad regulation is needed. For example, since Otu and Bam are involved in translational regulation, do the authors think that Mad is not translated and therefore it is the reason for less pMad? Currently the discussion presents just a summary of the results and not an extension of possible interpretation discussed in context of present literature.

    4. Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field. However, there are some weaknesses and I would recommend that they address the comments in the Recommendations for the authors section below.

    1. eLife Assessment

      This useful study presents findings on how some antibiotics, which inhibit protein synthesis in bacteria, affect the translation in mitochondrial ribosomes. The authors provide solid evidence that most tested antibiotics act similarly on bacterial and mitochondrial translation. Additionally, this work shows that alternative translation initiation events might exist in two specific mt-mRNAs (MT-ND1 and MT-ND5). The conclusions of this manuscript are of broad interest to the antibiotic and the mitochondrial fields.

    2. Reviewer #1 (Public review):

      Summary:

      This study aimed to determine whether bacterial translation inhibitors affect mitochondria through the same mechanisms. Using mitoribosome profiling, the authors found that most antibiotics, except telithromycin, act similarly in both systems. These insights could help in the development of antibiotics with reduced mitochondrial toxicity.<br /> They also identified potential novel mitochondrial translation events, proposing new initiation sites for MT-ND1 and MT-ND5. These insights not only challenge existing annotations but also open new avenues for research on mitochondrial function.

      Strengths:

      Ribosome profiling is a state-of-the-art method for monitoring the translatome at very high resolution. Using mitoribosome profiling, the authors convincingly demonstrate that most of the analyzed antibiotics act in the same way on both bacterial and mitochondrial ribosomes, except for telithromycin. Additionally, the authors report possible alternative translation events, raising new questions about the mechanisms behind mitochondrial initiation and start codon recognition in mammals.

      Weaknesses:

      The main weaknesses of this study are:<br /> - While the authors highlight an interesting difference in the inhibitory mechanism of telithromycin on bacterial and mitochondrial ribosomes, mechanistic explanations or hypotheses are lacking.<br /> - The assignment of alternative start codons in MT-ND1 and MT-ND5 is very interesting but does not seem to fully align with structural data.<br /> - The newly proposed translation events in the ncRNAs are preliminary and should be further substantiated with additional evidence or interpreted with more caution.

    3. Reviewer #2 (Public review):

      In this study, the authors set out to explore how antibiotics known to inhibit bacterial protein synthesis also affect mitoribosomes in HEK cells. They achieved this through mitoribosome profiling, where RNase I and Mnase were used to generate mitoribosome-protected fragments, followed by sequencing to map the regions where translation arrest occurs. This profiling identified the codon-specific impact of antibiotics on mitochondrial translation.

      The study finds that most antibiotics tested inhibit mitochondrial translation similarly to their bacterial counterparts, except telithromycin, which exhibited distinct stalling patterns. Specifically, chloramphenicol and linezolid selectively inhibited translation when certain amino acids were in the penultimate position of the nascent peptide, which aligns with their known bacterial mechanism. Telithromycin stalls translation at an R/K-X-R/K motif in bacteria, and the study demonstrated a preference for arresting at an R/K/A-X-K motif in mitochondria. Additionally, alternative translation initiation sites were identified in MT-ND1 and MT-ND5, with non-canonical start codons. Overall, the paper presents a comprehensive analysis of antibiotics in the context of mitochondrial translation toxicity, and the identification of alternative translation initiation sites will provide valuable insights for researchers in the mitochondrial translation field.

      From my perspective as a structural biologist working on the human mitoribosome, I appreciate the use of mitoribosome profiling to explore off-target antibiotic effects and the discovery of alternative mitochondrial translation initiation sites. However, the description is somewhat limited by a focus on this single methodology. The authors could strengthen their discussion by incorporating structural approaches, which have contributed significantly to the field. For example, antibiotics such as paromomycin and linezolid have been modeled in the human mitoribosome (PMID: 25838379), while streptomycin has been resolved (10.7554/eLife.77460), and erythromycin was previously discussed (PMID: 24675956). The reason we can now describe off-target effects more meaningfully is due to the availability of fully modified human mitoribosome structures, including mitochondria-specific modifications and their roles in stabilizing the decoding center and binding ligands, mRNA, and tRNAs (10.1038/s41467-024-48163-x).<br /> These and other relevant studies should be acknowledged throughout the paper to provide additional context.

    4. Reviewer #3 (Public review):

      Summary:

      Recently, the off-target activity of antibiotics on human mitoribosome has been paid more attention in the mitochondrial field. Hafner et al applied mitoribosome profilling to study the effect of antibiotics on protein translation in mitochondria as there are similarities between bacterial ribosome and mitoribosome. The authors conclude that some antibiotics act on mitochondrial translation initiation by the same mechanism as in bacteria. On the other hand, the authors showed that chloramphenicol, linezolid and telithromycin trap mitochondrial translation in a context-dependent manner. More interesting, during deep analysis of 5' end of ORF, the authors reported the alternative start codon for ND1 and ND5 proteins instead of previously known one. This is a novel finding in the field and it also provides another application of the technique to further study on mitochondrial translation.

      Strengths:

      This is the first study which applied mitoribosome profiling method to analyze mutiple antibiotics treatment cells.<br /> The mitoribosome profiling method had been optimized carefully and has been suggested to be a novel method to study translation events in mitochondria. The manuscript is constructive and written well.

      Weaknesses:

      This is a novel and interesting study, however, most of the conclusion comes from mitoribosome profiling analysis, as a result, the manuscript lacks the cellular biochemical data to provide more evidence and support the findings.

    1. eLife Assessment

      The authors studied the relationship between structural and functional lateralization in the planum temporale region of the brain, whilst also considering the morphological presentation of a single or duplicated Heschl's gyrus. The analyses are compelling due to a large sample size, inter-rater reliability, and corrections for multiple comparisons. The associations in this important work might serve as a reference for future targeted-studies on brain lateralization.

    2. Reviewer #1 (Public review):

      Summary:

      Qin and colleagues analysed data from the Human Connectome Project on four right-handed subgroups with different gyrification patterns in Heschl's gyrus. Based on these groups, the authors highlight the structure-function relationship of planum temporale asymmetry in lateralised language processing at the group level and next at the individual level. In particular, the authors propose that especially microstructural asymmetries are related to functional auditory language asymmetries in the planum temporale.

      Strengths:

      The study is interesting because of an ongoing and long-standing debate about the relationship between structural and functional brain asymmetries, and in particular whether structural brain asymmetries can be seen as markers of functional language brain lateralisation.

      In this debate, the relationship between Heschl's gyrus asymmetry and planum temporale asymmetry is rare and therefore valuable here. A large sample size and inter-rater reliability support the findings.

      Weaknesses:

      The authors highlight the microstructural results, but could also emphasise on their interesting macrostructural results.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Qin and colleagues analysed data from the Human Connectome Project on four right-handed subgroups with different gyrification patterns in Heschl's gyrus. Based on these groups, the authors highlight the structure-function relationship of planum temporale asymmetry in lateralised language processing at the group level and next at the individual level. In particular, the authors propose that especially microstructural asymmetries are related to functional auditory language asymmetries in the planum temporale.

      Strengths:

      The study is interesting because of an ongoing and long-standing debate about the relationship between structural and functional brain asymmetries, and in particular whether structural brain asymmetries can be seen as markers of functional language brain lateralisation.

      In this debate, the relationship between Heschl's gyrus asymmetry and planum temporale asymmetry is rare and therefore valuable here. A large sample size and inter-rater reliability support the findings.

      Weaknesses:

      In this case of multiple brain measures, it would be important to provide the reader with some sort of effect size (e.g. Cohen's d) to help interpret the results.

      Thank you for pointing this out. In the revised version, the effect size, i.e., Cohen's d, has been incorporated into the results (page 8, line 159-160; page 9, line 181-186, supplementary page 14, Table S14).

      In addition, the authors highlight the microstructural results in spite of the macrostructural results. However, the macrostructural surface results are also strong. I would suggest either reducing the emphasis on micro vs macrostructural results or adding information to justify the microstructural importance.

      In the original manuscript, we highlighted the results of microstructural measures because the correlations between PT microstructural and functional measures were more pronounced both within the hemispheres and in terms of asymmetry, compared with the significant results of surface area. Following your comments here, we now lowered the tone of microstructure results (page 2, line 40; page 14, line 267), and added relevant discussion regarding the macrostructural results in the revised version (page 18, line 363-370; as copied below):

      “As for macrostructural measures, the asymmetric PT surface area was also associated with speech comprehension AI. Given that the within-hemispheric coupling tendency between surface and speech comprehension existed only in the left PT, it was possible that the larger surface area of the left PT led to a less recruitment of its right homologous, and therefore the lateralization of functional activity would be more pronounced. Additionally, an opposite tendency was found between the correlation of speech perception and comprehension with surface area, potentially implying the segregation of the different speech processing in the PT area.”

      Recommendations for the authors:

      I have only some comments that I wish to be addressed by the authors:

      (1) Please always specify "structural" or "functional" asymmetry or lateralisation, as the reader may be confused.

      This has been done in relevant places.

      (2) Please state that the scale is not the same between the results in Figure 3.

      This have been specified, as suggested (see below).

      “Notably, we did not standardize these structural measures, so the scales differed between indicators.”

      (3) It may be of interest to the reader to learn more about interpretations of how Heschl's gyrus and planum temporale asymmetries are related.

      Thank you for this comment. Given that the asymmetry of Heschl's gyrus was not analyzed in the present study, we do not have direct data/results for such an interpretation. Also, we reviewed the literature but found no relevant results on how Heschl's gyrus and planum temporale asymmetries are related. To address this, specific investigation targeting on this topic is needed. This has now been added in the discussion (page 20, line 415-417).

      (4) As this manuscript builds somewhat on the Science Advances article by Ocklenburg et al. (2018), it would be important to discuss how this more liberal planum temporale definition might (or might not) affect the results compared to the more conservative planum temporale definition described here.

      Yes, the definition of planum temporale varies across studies. Our current manual one is relatively more conservative than the Ocklenburg et al. (2018), in which the planum temporale was automatically derived from the Destrieux atlas. We believe that the definition of the planum temporale likely have non-trivial impact on the results, and our current manual definition with the consideration of the HG duplication should be more reliable and accurate, therefore favored, relative to the other ones. This has been briefly discussed in the revision (page 15-16, line 300-304).

      (5) I would like the authors to briefly but critically discuss what exactly the MRI NODDI model measures and how this is interpreted as measuring microstructural properties of tissue.

      We now provided relevant information regarding the NODDI measures (page 26, line 552-558; as copied below).

      “NODDI is a highly effective method for detecting key features of neurite morphology, which employs a tissue model that detects three microstructural environments: the intracellular, extracellular and cerebrospinal fluid compartments (Zhang et al., 2012). In the grey matter of the cerebral cortex, the neurite density index (NDI) is an estimated volume fraction of the intracellular microstructural environment, with higher NDIs indicating greater neurite density (Jespersen et al., 2010; Zhang et al., 2012). The orientation dispersion index (ODI) is a measure of the alignment or dispersion of neurite, with higher ODIs indicating more dispersed neurite and lower ODIs indicating more aligned neurite (Jespersen et al., 2012; Zhang et al., 2012).”

      (6) While not mandatory, I would be interested to read the authors' thoughts on the evolution of such a functional/(micro)structural lateralisation link of the planum temporale, in light of the literature on planum temporale asymmetries in (newborn) non-human primate species.

      Thank you for this inspiring suggestion. We have incorporated relevant discussion into the revised version (page 15, line 281-288; as copied below).

      “Moreover, there exist evolutionary evidence supporting the role of the PT as an anatomical substrate for language lateralization. For example, the leftward structural asymmetry of the PT have been observed in multiple non-human primates, including chimpanzees, macaques, and baboons (Becker et al., 2024; Gannon et al., 1998; Xia et al., 2019). Particularly, recent studies on baboons further demonstrated that PT structural leftward asymmetry in newborn baboons could predict future development of communicative gestures, implying a key role of PT structural asymmetry in the lateralized communication system for human and non-human brain evolution (Becker et al., 2024, 2021).”

      Reference

      Becker Y, Phelipon R, Marie D, Bouziane S, Marchetti R, Sein J, Velly L, Renaud L, Cermolacce A, Anton J-L, Nazarian B, Coulon O, Meguerditchian A. 2024. Planum temporale asymmetry in newborn monkeys predicts the future development of gestural communication’s handedness. Nat Commun 15:4791. doi:10.1038/s41467-024-47277-6

      Becker Y, Sein J, Velly L, Giacomino L, Renaud L, Lacoste R, Anton J-L, Nazarian B, Berne C, Meguerditchian A. 2021. Early Left-Planum Temporale Asymmetry in newborn monkeys (Papio anubis): A longitudinal structural MRI study at two stages of development. NeuroImage 227:117575. doi:10.1016/j.neuroimage.2020.117575

      Gannon PJ, Holloway RL, Broadfield DC, Braun AR. 1998. Asymmetry of Chimpanzee Planum Temporale: Humanlike Pattern of Wernicke’s Brain Language Area Homolog. Science 279:220–222. doi:10.1126/science.279.5348.220

      Jespersen SN, Bjarkam CR, Nyengaard JR, Chakravarty MM, Hansen B, Vosegaard T, Østergaard L, Yablonskiy D, Nielsen NChr, Vestergaard-Poulsen P. 2010. Neurite density from magnetic resonance diffusion measurements at ultrahigh field: Comparison with light microscopy and electron microscopy. NeuroImage 49:205–216. doi:10.1016/j.neuroimage.2009.08.053

      Jespersen SN, Leigland LA, Cornea A, Kroenke CD. 2012. Determination of Axonal and Dendritic Orientation Distributions Within the Developing Cerebral Cortex by Diffusion Tensor Imaging. IEEE Trans Med Imaging 31:16–32. doi:10.1109/TMI.2011.2162099

      Xia J, Wang F, Wu Z, Wang L, Zhang C, Shen D, Li G. 2019. Mapping hemispheric asymmetries of the macaque cerebral cortex during early brain development. Hum Brain Mapp. doi:10.1002/hbm.24789

      Zhang H, Schneider T, Wheeler-Kingshott CA, Alexander DC. 2012. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. NeuroImage 61:1000–1016. doi:10.1016/j.neuroimage.2012.03.072

      Reviewer #2 (Public Review):

      Summary:

      The authors assessed the link between structural and functional lateralization in area PT, one of the brain areas that is known to exhibit strong structural lateralization, and which is known to be implicated in speech processing. Importantly, they included the sulcal configuration of Heschl's gyrus (HG), presenting either as a single or duplicated HG, in their analysis. They found several significant associations between microstructural indices and task-based functional lateralization, some of which depended on the sulcal configuration.

      Strengths:

      A clear strength is the large sample size (n=907), an openly available database, and the fact that HG morphology was manually classified in each individual. This allows for robust statistical testing of the effects across morphological categories, which is not often seen in the literature.

      Weaknesses:

      - Unfortunately, no left-handers were included in the study. It would have been a valuable addition to the literature, to study the effect of handedness on the observed associations, as many previous studies on this topic were not adequately powered. The fact that only right-handers were studied should be pointed out clearly in the introduction or even the abstract.

      Thank for pointing this out. We have explicitly specified this in the Abstract and Introduction.

      - The tasks to quantify functional lateralization were not specifically designed to pick up lateralization. In the interest of the sample size, it is understandable that the authors used the available HCP-task-battery results, however, it would have been feasible to access another dataset for validation. A targeted subset of results, concerning for example the relationship between sulcal morphology and task-based functional lateralization, could be re-assessed using other open-access fMRI datasets.

      Yes, the fMRI task was not specifically designed to evaluate PT functional lateralization, which has been acknowledged in the discussion (page 17, line 330-342). Given the observed small effect size of our current structural-functional relationship, reproducing similar results with other datasets would require a cohort with a large sample size. This would induce a quite labor-intensive work given our current manual protocol for outlining PT and HG for everyone. The lack of validation with independent dataset has been discussed as a limitation in the revised version. We will try to conduct such a validation in future work, likely after developing an automatic pipeline for accurately extracting the PT and HG in the individual space (like the manual outlining protocol).

      - The study is mainly descriptive and the general discussion of the findings in the larger context of brain lateralization comes a bit short. For example, are the observed effects in line with what we know from other 'language-relevant' areas? What could be the putative mechanisms that give rise to functional lateralization based on the microstructural markers observed? And which mechanisms might be underlying the formation of a duplicated HG?

      Thank you for these insightful comments. As suggested, we strengthened the discussion as below:

      “Another possible explanation could be that higher myelin content and larger surface area in left PT potentially indicated more white matter connection with other language-related regions such as Broca’s area, and therefore is more involved in language tasks than its right homolog (Allendorfer et al., 2016; Catani et al., 2005; Giampiccolo and Duffau, 2022).

      The distinct roles of left and right PT in speech processing have been well-documented. A number of studies substantiated that PT of the left hemisphere responded more strongly to lexical-semantic and syntactic aspects of sentence processing, whereas the right hemisphere demonstrated a greater involvement in the speech melody (Albouy et al., 2020; Meyer et al., 2002).

      These findings are consistent with those reported for the arcuate fasciculus (AF). The left AF has been identified as a crucial structure for language function (Giampiccolo and Duffau, 2022; Zhang et al., 2021). Disruption to this pathway has been linked to multimodal phonological and semantic deficits (Agosta et al., 2010), while injuries in the right AF did not affect language function (Zeineh et al., 2015).”

      Regarding the mechanism underlying the formation of a duplicated HG, we did not come up with good thoughts after careful literature review. Also, we feel that this is kind of out of the scope of the present study and therefore did not add more discussion on this topic.

      Recommendations for the authors:

      (1) The data availability statement makes no explicit mention of the manual labels of HG configuration. Would the authors consider making available a list of HCP-subject-ID with a morphological group (L1/R1, L1/R2, etc.) for replicability and for re-use by other researchers?

      The list of HCP-subject-ID with a morphological group (L1/R1, L1/R2, etc.) is now available in the supplementary material 2. We have specified this in the revised version.

      (2) It would be helpful to state again the statistical tests associated with the p-value in the figure/table caption, e.g. Table 2.

      As suggested, we now specified the statistical method in the figure/table caption.

      (3) Sometimes, the y-axis labels are missing or not clear, for example in Figure S2.

      Sorry about these. We double-checked all the figures, and corrected the missing or unclear labels for Figure S2 and S3 in the revised version.

      (4) In a few instances the font sizes vary within a figure caption.

      This has been corrected in the revision.

      Reference

      Agosta F, Henry RG, Migliaccio R, Neuhaus J, Miller BL, Dronkers NF, Brambati SM, Filippi M, Ogar JM, Wilson SM, Gorno-Tempini ML. 2010. Language networks in semantic dementia. Brain J Neurol 133:286–299. doi:10.1093/brain/awp233

      Albouy P, Benjamin L, Morillon B, Zatorre RJ. 2020. Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367:1043–1047. doi:10.1126/science.aaz3468

      Allendorfer JB, Hernando KA, Hossain S, Nenert R, Holland SK, Szaflarski JP. 2016. Arcuate fasciculus asymmetry has a hand in language function but not handedness. Hum Brain Mapp 37:3297–3309. doi:10.1002/hbm.23241

      Catani M, Jones DK, Ffytche DH. 2005. Perisylvian language networks of the human brain. Ann Neurol 57:8–16. doi:10.1002/ana.20319

      Giampiccolo D, Duffau H. 2022. Controversy over the temporal cortical terminations of the left arcuate fasciculus: a reappraisal. Brain J Neurol 145:1242–1256. doi:10.1093/brain/awac057

      Meyer M, Alter K, Friederici AD, Lohmann G, von Cramon DY. 2002. FMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum Brain Mapp 17:73–88. doi:10.1002/hbm.10042

      Zeineh MM, Kang J, Atlas SW, Raman MM, Reiss AL, Norris JL, Valencia I, Montoya JG. 2015. Right arcuate fasciculus abnormality in chronic fatigue syndrome. Radiology 274:517–526. doi:10.1148/radiol.14141079

      Zhang H, Schneider T, Wheeler-Kingshott CA, Alexander DC. 2012. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. NeuroImage 61:1000–1016. doi:10.1016/j.neuroimage.2012.03.072

      Zhang J, Zhong S, Zhou L, Yu Yamei, Tan X, Wu M, Sun P, Zhang W, Li J, Cheng R, Wu Y, Yu Yanmei, Ye X, Luo B. 2021. Correlations between Dual-Pathway White Matter Alterations and Language Impairment in Patients with Aphasia: A Systematic Review and Meta-analysis. Neuropsychol Rev 31:402–418. doi:10.1007/s11065-021-09482-8

      Reviewing Editor:

      I encourage the authors to incorporate the suggestions of the reviewers, such as:

      (1) to provide more in-depth interpretations about how and why structural and functional lateralization relate,

      Done.

      (2) to provide statistical effect sizes,

      Done.

      (3) to make their sulcal-morphology classification openly available,

      Done.

      (4) to provide statistical effect sizes,

      Done

      (5) to discuss the possible impact of diverging PT definitions with regard to previous studies,

      Done.

      (6) to provide more in-depth interpretations about how and why structural and functional lateralization relate.

      Done.

      Detailed comments:

      In an impressive cohort of 907 human participants, the present paper presents a very interesting set of data on PT asymmetries not only at the macro-structural but also at the microstructural levels in order to investigate their potential correlates with PT functional asymmetry in relation to perceptual acoustic language tasks.

      I believe this is a key paper for the following reasons:

      (1) it provides critical data and results for addressing a controversial but important question: the relevance of measures of anatomical asymmetry for inferring its language-related functional hemispheric specialization;

      (2) to do so, the authors made a very impressive effort to manually trace the anatomical delineation of the planum temporale at different levels in every participant, the best (but crazy time-consuming) approach so far to document interindividual variability of the PT and to address such a question;

      (3) the contribution is particularly relevant regarding the statistical power of the study, the study and measures having been done in 907 participants!

      (4) I also found the study well designed and well written with great relevance of the findings for the field.

      As the results, the authors reported asymmetric measures of microstructural asymmetry (including intracortical myelin content, neurite density, and neurite orientation) but also of macrostructural asymmetries in relation to functional lateralization for language.

      Comments:

      I have only 2 additional minor comments of my own:

      (1) In agreement with reviewer 2, I don't understand why the authors seem to downplay the links they found between gross PT asymmetry and functional lateralization. I recommend the authors to highlight and discuss this important result, just as the microstructural PT asymmetries and their functional links.

      This has been done (page 18, line 363-370).

      (2) PT structural asymmetry (both micro & macro) has been well documented in nonhuman primates (and their functional link with manual lateralization for gestural communication). Without detailing this literature, I recommend the authors at least mention this literature as a comparative perspective in the introduction and/or discussion in order to make the question of PT asymmetry less anthropocentric.

      This has been done (page 15, line 281-288).

    1. eLife Assessment

      This study investigates the molecular mechanisms underlying chronic pain-related memory impairment by focusing on S1P/S1PR1 signaling in the dentate gyrus (DG) of the hippocampus. Through behavioral tests (Y-maze and Morris water maze) and RNA-seq analysis, the researchers discovered that S1P/S1PR1 signaling is crucial for determining susceptibility to memory impairment, with decreased S1PR1 expression linked to structural plasticity changes and memory deficits. This work has important significance and a convincing level of evidence, thus offering new insights into the mechanisms underlying chronic pain-related memory impairment.

    2. Reviewer #1 (Public review):

      This work from Cui, Pan, Fan et al explores memory impairment in chronic pain mouse models, a topic of great interest for the neurobiology field. In particular, the work starts from a very interesting observation, that WT mice can be divided in susceptible and unsusceptible to memory impairment upon modelling chronic pain with CCI. This observation represents the basis of the work where the authors identify the sphingosine receptor S1PR1 as down-regulated in the dentate gyrus of susceptible animals and demonstrate through an elegant range of experiments involving AAV mediated knockdown or overexpression of S1PR1 that this receptor is involved in the memory impairment observed with chronic pain. Importantly for translational purposes, they also show that activation of S1PR1 through a pharmacological paradigm is able to rescue the memory impairment phenotype.

      The authors also link these defects to reduced dendritic branching and reduced number of mature excitatory synapses in the DG to the memory phenotype.

      They then proceed to explore possible mechanisms downstream of S1PR1 that could explain this reduction in dendritic spines. They identify integrin α2 as an interactor of S1PR1 and show a reduction in several proteins involved in actin dynamic, which is crucial for dendritic spine formation and plasticity.

      They thus hypothesize that the interaction between S1PR1 and Integrin α2 is fundamental for the activation of Rac1 and Cdc42 and consequently for the polymerisation of actin; a reduction in this pathway upon chronic pain would thus lead to impaired actin polymerisation, synapse formation and thus impaired memory.

      The work is of great interest and the experiments are of very good quality with results of great importance.

      Comments on revisions:

      The authors have replied satisfactorily to my previous concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The study investigates the molecular mechanisms underlying chronic pain-related memory impairment by focusing on S1P/S1PR1 signaling in the dentate gyrus (DG) of the hippocampus. Through behavioural tests (Y-maze and Morris water maze) and RNA-seq analysis, the researchers segregated chronic pain mice into memory impairment-susceptible and -unsusceptible subpopulations. They discovered that S1P/S1PR1 signaling is crucial for determining susceptibility to memory impairment, with decreased S1PR1 expression linked to structural plasticity changes and memory deficits.

      Knockdown of S1PR1 in the DG induced a susceptible phenotype, while overexpression or pharmacological activation of S1PR1 promoted resistance to memory impairment and restored normal synaptic structure. The study identifies actin cytoskeleton-related pathways, including ITGA2 and its downstream Rac1/Cdc42 signaling, as key mediators of S1PR1's effects, offering new insights and potential therapeutic targets for chronic pain-related cognitive dysfunction.

      This manuscript consists of a comprehensive investigation and significant findings. The study provides novel insights into the molecular mechanisms of chronic pain-related memory impairment, highlighting the critical role of S1P/S1PR1 signaling in the hippocampal dentate gyrus. The clear identification of S1P/S1PR1 as a potential therapeutic target offers promising avenues for future research and treatment strategies. The manuscript is well-structured, methodologically sound, and presents valuable contributions to the field.

      Strengths:

      (1) The manuscript is well-structured and written in clear, concise language. The flow of information is logical and easy to follow.

      (2) The segregation of mice into memory impairment-susceptible and -unsusceptible subpopulations is innovative and well-justified. The statistical analyses are robust and appropriate for the data.

      (3) The detailed examination of S1PR1 expression and its impact on synaptic plasticity and actin cytoskeleton reorganization is impressive. The findings are significant and contribute to the understanding of chronic pain-related memory impairment.

      Comments on revisions:

      The authors have satisfactorily addressed all the issues raised.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work from Cui, Pan, Fan, et al explores memory impairment in chronic pain mouse models, a topic of great interest in the neurobiology field. In particular, the work starts from a very interesting observation, that WT mice can be divided into susceptible and unsusceptible to memory impairment upon modelling chronic pain with CCI. This observation represents the basis of the work where the authors identify the sphingosine receptor S1PR1 as down-regulated in the dentate gyrus of susceptible animals and demonstrate through an elegant range of experiments involving AAV-mediated knockdown or overexpression of S1PR1 that this receptor is involved in the memory impairment observed with chronic pain. Importantly for translational purposes, they also show that activation of S1PR1 through a pharmacological paradigm is able to rescue the memory impairment phenotype.

      The authors also link these defects to reduced dendritic branching and a reduced number of mature excitatory synapses in the DG to the memory phenotype.

      They then proceed to explore possible mechanisms downstream of S1PR1 that could explain this reduction in dendritic spines. They identify integrin α2 as an interactor of S1PR1 and show a reduction in several proteins involved in actin dynamic, which is crucial for dendritic spine formation and plasticity.

      They thus hypothesize that the interaction between S1PR1 and Integrin α2 is fundamental for the activation of Rac1 and Cdc42 and consequently for the polymerisation of actin; a reduction in this pathway upon chronic pain would thus lead to impaired actin polymerisation, synapse formation, and thus impaired memory.

      The work is of great interest and the experiments are of very good quality with results of great importance. I have however some concerns. The main concern I have relates to the last part of the work, namely Figures 8 and 9, which I feel are not at the same level as the results presented in the previous 7 Figures, which are instead outstanding.

      In particular:

      - In Figure 8, given the reduction in all the proteins tested, the authors need to check some additional proteins as controls. One good candidate could be RhoA, considering the authors say it is activated by S1PR2 and not by S1PR1;

      Thanks for your suggestion. We tested the expression level of RhoA in mice 7 days and 21 days post CCI as negative controls (Supplemental Figure 9).

      - In addition to the previous point, could the authors also show that the number of neurons is not grossly different between susceptible and unsusceptible mice? This could be done by simply staining for NeuN or performing a western blot for a neuronal-specific protein (e.g. Map2 or beta3-tubulin);

      As suggested, we performed immunofluorescence using NeuN antibody to detect the number of neurons in susceptible and unsusceptible mice. The number is not significantly different between the two populations (Supplementary Figure 7).

      - In Figure 8, the authors should also evaluate the levels of activated RAC1 and activated Cdc42, which are much more important than just basal levels of the proteins to infer an effect on actin dynamics. This is possible through kits that use specific adaptors to pulldown GTP-Rac1 and GTP-Cdc42;

      Thanks for your constructive suggestion. An elevated level and hyperactivation of Rac1 protein are both associated with actin dynamics and dendritic development [1]. We agree that showing the levels of activated RAC1 is better to infer its effect on actin dynamics. Here in Figure 8, the purpose of this experiment is to prove the levels of actin organization related proteins are altered according to the expression level of S1PR1, thus drawing a conclusion that the actin organization was disrupted, but not to specifically emphasize that S1PR1 activated these proteins. We apologize for the confusion made but we think the current data is enough to support the conclusion.

      Thanks again for your advice. Your understanding is greatly appreciated.

      - In Figure 9C, the experiment is performed in an immortalised cell line. I feel this needs to be performed at least in primary hippocampal neurons;

      Thanks for your suggestion. As suggested, we performed the experiment in primary hippocampal neurons. Knockdown of S1pr1 in primary hippocampal neurons induced reduction in the number of branches and filamentous actin. Please refer to the updated Figure 9C.

      - In Figure 9D, the authors use a Yeast two-hybrid system to demonstrate the interaction between S1PR1 and Integrin α2. However, as the yeast two-hybrid system is based on the proximity of the GAL4 activating domain and the GAL4 binding domain, which are used to activate the transcription of reporter genes, the system is not often used when probing the interaction between transmembrane proteins. Could the authors use other transmembrane proteins as negative controls?;

      Thanks for your question. We apologize for the unclear description in the method part. Traditional yeast two-hybrid system can only detect protein interactions that occur in the nucleus, but cannot detect ones between membrane proteins. Here, we utilized the split-ubiquitin membrane-based Yeast two-hybrid system. Briefly, in the ubiquitin system, ubiquitin, a protein composed of 76 amino acid residues that can mediate the ubiquitination degradation of target proteins by proteasomes, is split into two domains, namely Cub at the C-terminus and NbuG at the N-terminus, which are fused and expressed with the bait protein “Bait” and the prey protein “Prey”, respectively. At the same time, Cub is also fused with transcription factors. If Bait and Prey proteins could bind, Cub and NbuG would be brought together and a complete ubiquitin would be formed, which would be recognized by the proteasome and the fused transcription factor would be cut off and enter the cell nucleus to activate the expression of the reporter gene. We then determine whether the Bait and Prey proteins interact with each other through the growth of the yeast.

      Thanks again for pointing this out. We reworded the method in M&M (Line 678-696).

      - In Figure 9E, the immunoblot is very unconvincing. The bands in the inputs are very weak for both ITGA2 and S1PR1, the authors do not show the enrichment of S1PR1 upon its immunoprecipitation and the band for ITGA2 in the IP fraction has a weird appearance. Were these experiments performed on DG lysates only? If so, I suggest the authors repeat the experiment using the whole brain (or at least the whole hippocampus) so as to have more starting material. Alternatively, if this doesn't work, or in addition, they could also perform the immunoprecipitation in heterologous cells overexpressing the two proteins;

      Thanks for the question and suggestion. We used DG lysates from both the dentate gyrus of a single mouse as the starting material. We updated the result which showed clearer bands (Figure 9E).

      - About the point above, even if the results were convincing, the authors can't say that they demonstrate an interaction in vivo. In co-IP experiments, the interaction is much more likely to occur in the lysate during the incubation period rather than being conserved from the in vivo state. These co-IPs demonstrate the ability of proteins to interact, not necessarily that they do it in vivo. If the authors wanted to demonstrate this, they could perform a Proximity ligation assay in primary hippocampal neurons, using antibodies against S1PR1 and ITGA2.

      Thanks for your concern. Co-immunoprecipitation (Co-IP) is the gold standard to identify protein-protein interactions [2], and it is one of the most efficient techniques to study these protein-protein interactions in vivo [3]. We repeated the experiment and followed the experimental procedure exactly to avoid the protein interaction due to over-incubation. Over-incubation, particularly at room temperature, may result in non-specific binding and therefore high background, thus we performed Co-IPs at 4°C to preserve protein interactions. We agree that Proximity ligation assay is better suited for studies of endogenously expressed proteins in primary cells [4]. Since we optimized the experiment procedure to avoid non-specific binding and particularly, Co-IP utilized proteins from DG lysates which could validate the specificity of the protein interaction in native tissue, we prefer to keep the Co-IP result in Figure 9E.

      Thanks again for your suggestion. We appreciate your understanding on this matter.

      - In Figure 9H, could the authors increase the N to see if shItga2 causes further KD in the CCI?

      As suggested, we repeated the experiment and increased the N to 6. As shown in the following picture, shItga2 did not cause further KD in the CCI.

      Author response image 1.

      - To conclusively demonstrate that S1PR1 and ITGA2 participate in the same pathway, they could show that knocking down the two proteins at the same time does not have additive effects on behavioral tests compared to the knockdown of each one of them in isolation.

      Thanks for your suggestion. As suggested, we knocked down the two proteins at the same and did not observe additive effects on behavioral tests compared to the knockdown of each one of them in isolation. Please refer to Figure 9L-O.

      Other major concerns:

      - Supplementary Figure 5: the image showing colocalisation between S1PR1 and CamKII is not very convincing. Is the S1PR1 antibody validated on Knockout or knockdown in immunostaining?;

      S1PR1 is a membrane receptor and the S1P1 antibody (PA1-1040, Invitrogen) shows membranous staining with diffuse dot-like signals (Please refer to the image “A” provided by ThermoFisher Scientific). Here, we utilized the antibody to detect the expression of S1PR1 in DG granule cells. We can see the diffuse dot-like signals aggregated in each single granule cell. CaMKII shows intense staining around the border of the granule cell soma (Image “B”) [5]. According to the images shown in Supplementary Figure 5B, we concluded that S1PR1 is expressed in CaMKII+ cells.

      Besides, as suggested, we validated the S1PR1 antibody on knockdown in immunostaining (Image “C” and “D”). The expression of S1PR1 is significantly decreased compared with the control.

      Author response image 2.

      - It would be interesting to check S1PR2 levels as a control in CCI-chronic animals;

      As suggested, we quantified the S1PR2 levels in Sham and CCI animals, and there is no significant difference between groups (Supplementary Figure 9).

      - Figure 1: I am a bit concerned about the Ns in these experiments. In the chronic pain experiments, the N for Sham is around 8 whereas is around 20 for CCI animals. Although I understand higher numbers are necessary to see the susceptible and unsusceptible populations, I feel that then the same number of Sham animals should be used;

      Thanks for your concern. In the preliminary experiment, we noticed that the ratio of susceptible and unsusceptible populations is around 1:1. After the behavioral tests, we need to further take samples to investigate molecular and cellular changes of each group. Thus, we set sham around 8 and CCI around 20 to ensure that after characterization into susceptible and unsusceptible groups, each group has relatively equal numbers for further investigations.

      - Figures 1E and 1G have much higher Ns than the other panels. Why is that? If they have performed this high number of animals why not show them in all panels?;

      Thanks for your concern. For Figure 1B, C, D and F, we showed the data for each batch of experiment, while for Figure 1E and 1G, we used data collected from all batches of experiment. To show the data from a single batch, we would like to demonstrate the ratio of susceptible to unsusceptible is relatively stable, but not only based on a big sample size.

      - In the experiments where viral injection is performed, the authors should show a zoomed-out image of the brain to show the precision of the injection and how spread the expression of the different viruses was;

      As suggested, we showed the zoomed-out image in Supplementary Figure 6. The viruses are mainly expressed in the hippocampal DG.

      - The authors should check if there is brain inflammation in CCI chronic animals. This would be interesting to explain if this could be the trigger for the effects seen in neurons. In particular, the authors should check astrocytes and microglia. This is of interest also because the pathways altered in Figure 8A are related to viral infection.

      - If the previous point shows increased brain inflammation, it would be interesting for the authors to check whether a prolonged anti-inflammatory treatment in CCI animals administered before the insurgence of memory impairment could stop it from happening;

      - In addition, the authors should speculate on what could be the signal that can induce these molecular changes starting from the site of injury;

      - Also, as the animals are all WT, the authors should speculate on what could render some animals prone to have memory impairments and others resistant.<br />

      Thanks for the above four suggestions. We have observed inflammation including T cell infiltration and microglia activation in the hippocampal DG in CCI chronic animals and also used S1PR1 modulator which has anti-lymphocyte mediated inflammatory effect to prevent the insurgence of memory impairment from happening. We also examined the alteration in the numbers of peripheral T-lymphocyte subsets and the serum levels of cytokines. Furthermore, we found a neuron-microglia dialogue in the DG which may promote the resilience to memory impairment in CCI animals. Since these are unpublished results, we apologize that we would not give much detailed information to the public at the current stage. We will publish these data as soon as possible. Thanks for your understanding.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates the molecular mechanisms underlying chronic pain-related memory impairment by focusing on S1P/S1PR1 signaling in the dentate gyrus (DG) of the hippocampus. Through behavioural tests (Y-maze and Morris water maze) and RNA-seq analysis, the researchers segregated chronic pain mice into memory impairment-susceptible and -unsusceptible subpopulations. They discovered that S1P/S1PR1 signaling is crucial for determining susceptibility to memory impairment, with decreased S1PR1 expression linked to structural plasticity changes and memory deficits.

      Knockdown of S1PR1 in the DG induced a susceptible phenotype, while overexpression or pharmacological activation of S1PR1 promoted resistance to memory impairment and restored normal synaptic structure. The study identifies actin cytoskeleton-related pathways, including ITGA2 and its downstream Rac1/Cdc42 signaling, as key mediators of S1PR1's effects, offering new insights and potential therapeutic targets for chronic pain-related cognitive dysfunction.

      This manuscript consists of a comprehensive investigation and significant findings. The study provides novel insights into the molecular mechanisms of chronic pain-related memory impairment, highlighting the critical role of S1P/S1PR1 signaling in the hippocampal dentate gyrus. The clear identification of S1P/S1PR1 as a potential therapeutic target offers promising avenues for future research and treatment strategies. The manuscript is well-structured, methodologically sound, and presents valuable contributions to the field.

      Strengths:

      (1) The manuscript is well-structured and written in clear, concise language. The flow of information is logical and easy to follow.

      (2) The segregation of mice into memory impairment-susceptible and -unsusceptible subpopulations is innovative and well-justified. The statistical analyses are robust and appropriate for the data.

      (3) The detailed examination of S1PR1 expression and its impact on synaptic plasticity and actin cytoskeleton reorganization is impressive. The findings are significant and contribute to the understanding of chronic pain-related memory impairment.

      Weaknesses:

      (1) Results: While the results are comprehensive, some sections are data-heavy and could be more reader-friendly with summarized key points before diving into detailed data.

      Thanks for the suggestion. For the first sentence in each part/paragraph, we used statement that summarises what will be investigating in the following experiments to make it more reader-friendly. They are labeled as blue in the main text.

      (2) Discussion: There is a need for a more balanced discussion regarding the limitations of the study. For example, addressing potential biases in the animal model or limitations in the generalizability of the findings to humans would strengthen the discussion. Also, providing specific suggestions for follow-up studies would be beneficial.

      As suggested, we discussed more on the limitations of this study and outlined some directions for future research (Line 481-498).

      (3) Conclusion: The conclusion, while concise, could better highlight the study's broader impact on the field and potential clinical implications.

      Thanks. We reworded the conclusion to better highlight the impacts of this study (Line 501-505).

      Reviewer #3 (Public Review):

      Summary of the Authors' Objectives:

      The authors aimed to delineate the role of S1P/S1PR1 signaling in the dentate gyrus in the context of memory impairment associated with chronic pain. They sought to understand the molecular mechanisms contributing to the variability in memory impairment susceptibility and to identify potential therapeutic targets.

      Major Strengths and Weaknesses of the Study:

      The study is methodologically robust, employing a combination of RNA-seq analysis, viral-mediated gene manipulation, and pharmacological interventions to investigate the S1P/S1PR1 pathway. The use of both knockdown and overexpression approaches to modulate S1PR1 levels provides compelling evidence for its role in memory impairment. The research also benefits from a comprehensive assessment of behavioral changes associated with chronic pain.

      However, the study has some weaknesses. The categorization of mice into 'susceptible' and 'unsusceptible' groups based on memory performance requires further validation. Additionally, the reliance on a single animal model may limit the generalizability of the findings. The study could also benefit from a more detailed exploration of the impact of different types of pain on memory impairment.

      Assessment of the Authors' Achievements:

      The authors successfully identified S1P/S1PR1 signaling as a key factor in chronic pain-related memory impairment and demonstrated its potential as a therapeutic target. The findings are supported by rigorous experimental evidence, including biochemical, histological, and behavioral data. However, the study's impact could be enhanced by further exploration of the molecular pathways downstream of S1PR1 and by assessing the long-term effects of S1PR1 manipulation.

      Impact on the Field and Utility to the Community:

      This study is likely to have a significant impact on pain research by providing a novel perspective on the mechanisms underlying memory impairment in chronic pain conditions. The identification of the S1P/S1PR1 pathway as a potential therapeutic target could guide the development of new treatments.

      Additional Context for Readers:

      The study's approach to categorizing susceptibility to memory impairment could inspire new methods for stratifying patient populations in clinical settings.

      Recommendations:

      (1) A more detailed explanation of the k-means clustering algorithm and its application in categorizing mice should be provided.

      As suggested, we explained the k-means clustering algorithm in details (Line 697-711).

      (2) The discussion on the potential influence of different pain types or sensitivities on memory impairment should be expanded.

      Thanks for your suggestion. We discussed this point in the limitations of this study (Line 484-491).

      (3) The protocol for behavioral testing should be clarified and the potential for learning or stress effects should be addressed.

      Thanks for your suggestion. We clarified the order of the battery of behavioral tests in this study (Line 537-542). We start with the least stressful test (Y-maze) and leave the most stressful of all for last (Morris Water maze) [6]. Besides, we also conducted behavioral assays to prove that a one-day rest is enough to decrease carryover effects from prior test (Y-maze). We examined the stress related behaviors one day after Y-maze (23d post CCI) using open field test (OFT) and elevated plus maze (EPM). As shown in Author response image 3, the tests did not reflect the mice were under stressful circumstances. Thus, the order in which the tests were performed are appropriate in this study.

      Author response image 3.

      (4) Conduct additional behavioral assays for other molecular targets implicated in the study.

      We agree that other molecular targets on susceptibility to memory impairment would be interesting to know. Our study was designed to focus specifically on ITGA2 this time and we'd like to keep the focus intact, but we have included your point as a consideration for future study (Lines 496-498). Thank you for the suggestion.

      (5) The effective drug thresholds and potential non-specific effects of pharmacological interventions should be discussed in more detail.

      As suggested, we emphasized this point of drug SEW2871 in Line 242-245.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      - In Figure 6E the lines of the different groups are not visible. Showing the errors as error bars for each point would probably be better;

      We apologize for the mistake of using mean±SD here instead of mean±SEM. After changing to mean±SEM, the lines of Figure 6E, Figure 7E and 7L become much clearer. It looks a little bit messy to show the error bars since there are numerous points, so we prefer to keep the line style.

      - Do the authors have any speculation on why the % time in the quadrant is not further affected in the KD Itga2 in CCI animals (Figure 9K)?;

      In CCI animals, the level of S1PR1 expression is decreased. ITGA2 may participate in the same pathway with S1PR1. Thus, knocking down ITGA2 in CCI animals will not further affect the animal behaviors. This has been proved by knocking down the two proteins at the same time and no additive effects were observed on behavioral tests compared to the knockdown of each one of them in isolation (Figure 9L-O).

      - In the methods, it's unclear if in the multiple infusion, the animals were anaesthetised or kept awake;

      We have clarified this point in the method. mice were deeply anesthetized by 1% pentobarbital sodium (40 mg/kg, i.p.). (Line 649-650)

      - As the DG is quite small, could the authors clarify if, when performing western blots, they used the two DGs from one animal for each sample or if they pulled together the DGs of several animals?;

      We used the two DGs from one animal for each sample. The amount of protein extracted from each sample is enough for 20-30 times of Western Blot assays. We have now added this to the method for clarity (Line 612).

      - Is it possible to check the correlation between performance in the YM and MWM with S1PR1 levels?;

      We would also be interested in this point. The data that we have cannot reveal this for it is difficult to manipulate the S1PR1 levels by using KD and overexpression viruses.

      - EM images have a poor resolution in the figures, could the authors show higher-resolution images?;

      We have inserted 300 DPI images for high resolution output.

      - In line 268 there is a mention of an "ShLamb1"?

      We apologize for the mistake and it was revised.

      Reviewer #3 (Recommendations For The Authors):

      This study explored the role of S1P/S1PR1 signaling within the dentate gyrus (DG) in chronic pain-related memory impairment using a murine model. The authors identified decreased expression of S1PR1 in the DG of mice susceptible to memory deficits. They demonstrated that S1PR1 knockdown increased susceptibility to memory deficits, whereas its overexpression or pharmacological activation mitigated these effects. Further biochemical and immunofluorescence analyses indicated that disruptions in S1P/S1PR1 signaling were related to disruptions in actin cytoskeleton dynamics, influenced by molecular pathways involving ITGA2, Rac1/Cdc42 signaling, and the Arp2/3 complex. These findings offer intriguing insights and suggest a potential therapeutic target for treating memory impairment in chronic pain.

      Major Concerns:

      The following five major concerns are the same with the five recommendations from Reviewer 3 on Page 9-10. Please refer to the answers above.

      (1) The division of subjects into 'susceptible' and 'unsusceptible' categories requires further clarification regarding the methodologies and rationale employed, particularly concerning the use of the k-means clustering algorithm in data analysis. This explanation will strengthen the scientific grounding of the categorization process.

      (2) The categorization of 'susceptible' and 'unsusceptible' groups might also benefit from a more detailed analysis or discussion concerning the influence of different pain sensitivities or types of pain assessments. Although the study mentions that memory impairment stands independent of pain thresholds, a more nuanced exploration could provide deeper insights.

      (3) The article could benefit from more clarity on the protocol of behavioral testing, especially regarding the potential effects of repeated testing on performance outcomes due to learning or stress.

      (4) While the connection between S1P/S1PR1 signaling and the molecular pathways highlighted (ITGA2, Rac1/Cdc42, Arp2/3) is intriguing, only ITGA2 underwent further behavioral validation in vivo. Conducting additional behavioral assays for one or more of the molecular targets could substantially strengthen these findings.

      (5) Discussions regarding effective drug thresholds and the potential for non-specific effects are essential to fully evaluate the implications of pharmacological interventions utilized in the study.

      Minor Concerns:

      (1) Clarification of evidence of the specific infusion sites in pharmacological experiments would enhance the transparency and replicability of these methods.

      For the infusion of S1PR1 agonist, guide cannula (internal diameter 0.34 mm, RWD) was unilaterally implanted into DG of hippocampus (-1.3 A/P, -1.95 M/L, and -2.02 D/V) as evidenced by Figure 5B.

      (2) It would be beneficial if the manuscript provided details regarding the efficiency and reach of viral transfection within the neuronal population. This information would help in assessing the impact of genetic manipulations.

      S1PR1 immunostaining showed that the efficiency is quite high and the reach of viral transfection is sufficient.

      Author response image 4.

      (3) The manuscript should make explicit the normalization techniques used in quantitative assessments such as Western blotting, including the housekeeping genes or proteins used for this purpose.

      Here, we used housekeeping protein normalization for normalizing Western blot data. GAPDH was used as the internal control. First, the stained blot is imaged, a rectangle is drawn around the target protein in each lane, and the signal intensity inside the rectangle is measured by using ImageJ. The signal intensity obtained can then be normalized by being divided by the signal intensity of the loading internal control (GAPDH) detected on the same blot. The average of the ratios from the control group is calculated, and all individual ratios are divided by this average to obtain a new set of values, which represent the normalized values (Line 619-625).

      (4) Details about the control groups in behavioral assessments were subjected to comparable handling and experimental conditions as the chronic pain groups are crucial, barring nerve injury, for maintaining the integrity of the comparative analysis.

      We agree that a control group and an experimental group is identical in all respects except for one difference-nerve injury. We have added this point in the method (Line 520-522).

      Minor Recommendations:

      The following four minor recommendations are the same with the four minor concerns from Reviewer 3 on Page 12-13. Please refer to the answers above.

      (1) Clarify the specifics of infusion site verification in pharmacological experiments.

      (2) Provide details on the efficiency and neuronal reach of viral transfections.

      (3) Explicitly describe the normalization techniques used in quantitative assessments.

      (4) Ensure that control groups in behavioral assessments undergo comparable handling to maintain analysis integrity.

      References

      (1) Gualdoni, S., et al., Normal levels of Rac1 are important for dendritic but not axonal development in hippocampal neurons. Biology of the Cell, 2007. 99(8): p. 455-464.

      (2) Alam, M.S., Proximity Ligation Assay (PLA). Curr Protoc Immunol, 2018. 123(1): p. e58.

      (3) Song, P., S. Zhang, and J. Li, Co-immunoprecipitation Assays to Detect In Vivo Association of Phytochromes with Their Interacting Partners. Methods Mol Biol, 2021. 2297: p. 75-82.

      (4) Krieger, C.C., et al., Proximity ligation assay to study TSH receptor homodimerization and crosstalk with IGF-1 receptors in human thyroid cells. Frontiers in Endocrinology, 2022. 13.

      (5) Arruda-Carvalho, M., et al., Conditional Deletion of α-CaMKII Impairs Integration of Adult-Generated Granule Cells into Dentate Gyrus Circuits and Hippocampus-Dependent Learning. The Journal of Neuroscience, 2014. 34(36): p. 11919-11928.

      (6) Wolf, A., et al., A Comprehensive Behavioral Test Battery to Assess Learning and Memory in 129S6/Tg2576 Mice. PLoS One, 2016. 11(1): p. e0147733.

    1. eLife Assessment

      This important work investigates how orientation signals detected in higher brain areas may be transformed into motor responses in behaving animals. The authors characterize two types of descending neurons (DNs) that connect the brain to motor units and are involved in different aspects of turning control. They further show that orientation signals act by preferentially increasing relative stimulation onto left- or right-turn-inducing DNs. These convincing results, together with the independent work that they have inspired, represent significant progress in our understanding of mechanisms of animal navigation.

    2. Reviewer #1 (Public review):

      Summary:

      The paper addresses the knowledge gap between the representation of goal direction in the central complex and how motor systems stabilize movement toward that goal. The authors focused on two descending neurons, DNa01 and 02, and showed that they play different roles in steering the fly toward a goal. They also explored the connectome data to propose a model to explain how these DNs could mediate response to lateralized sensory inputs. They finally used lateralized optogenetic activation/inactivation experiments to test the roles of these neurons in mediating turnings in freely walking flies.

      Strengths:

      The experiments are well-designed and controlled. The experiment in Figure 4 is elegant, and the authors put a lot of effort into ensuring that ATP puffs do not accidentally activate the DNs. They also have explained complex experiments well. I only have minor comments for the authors.

      Weaknesses:

      (1) I do not fully understand how the authors extracted the correlation functions from the population data in Figure 1. Since the ipsilateral DNs are anti-correlated with the contralateral ones, I expected that the average will drop to zero when they are pooled together (e.g., 1E-G). Of course, this will not be the case if all the data in Figure 1 are collected from the same brain hemisphere. It would be helpful if the authors could explain this.

      (2) What constitutes the goal directions in Figures 1-3 and 8, as the authors could not use EPG activity as a proxy for goal directions? If these experiments were done in the dark, without landmarks, one would expect the fly's heading to drift randomly at times, and they would not engage the DNa01/02 for turning. Do the walking trajectories in these experiments qualify as menotactic bouts?

      (3) In Figure 2B, the authors mentioned that DNa02 overpredicts and 01 underpredicts rapid turning and provided single examples. It would be nice to see more population-level quantification to support this claim.

    3. Reviewer #2 (Public review):

      The data is largely electrophysiological recordings coupled with behavioral measurements (technically impressive) and some gain-of-function experiments in freely walking flies. Loss-of-function was tested but had minimal effect, which is not surprising in a system with partially redundant control mechanisms. The data is also consistent with/complementary to subsequent manuscripts (Yang 2023, Feng 2024, and Ros 2024) showing additional descending neurons with contributions to steering in walking and flying.

      The experiments are well executed, the results interesting, and the description clear. Some hypotheses based on connectome anatomy are tested: the insights on the pre-synaptic side - how sensory and central complex heading circuits converge onto these DNs are stronger than the suggestions about biomechanical mechanisms for how turning happens on the motor side.

      Of particular interest is the idea that different sensory cues can converge on a common motor program. The turn-toward or turn-away mechanism is initiated by valence rather than whether the stimulus was odor or temperature or memory of heading. The idea that animals choose a direction based on external sensory information and then maintain that direction as a heading through a more internal, goal-based memory mechanism, is interesting but it is hard to separate conclusively.

      The "see-saw", where left-right symmetry is broken to allow a turn, presumably by excitation on one side and inhibition of the other leg motor modules, is interesting but not well explained here. How hyperpolarization affects motor outputs is not clear.

      The statement near Figure 5B that "DNa02 activity was higher on the side ipsilateral to the attractive stimulus, but contralateral to the aversive stimulus" is really important - and only possible to see because of the dual recordings.

    4. Reviewer #3 (Public review):

      Summary:

      Rayshubskiy et al. performed whole-cell recordings from descending neurons (DNs) of fruit flies to characterize their role in steering. Two DNs implicated in "walking control" and "steering control" by previous studies (Namiki et al., 2018, Cande et al., 2018, Chen et al., 2018) were chosen by the authors for further characterization. In-vivo whole-cell recordings from DNa01 and DNa02 showed that their activity predicts spontaneous ipsilateral turning events. The recordings also showed that while DNa02 predicts transient turns DNa01 predicts slow sustained turns. However, optogenetic activation or inactivation showed relatively subtle phenotypes for both neurons (consistent with data in other recent preprints, Yang et al 2023 and Feng et al 2024). The authors also further characterized DNa02 with respect to its inputs and showed a functional connection with olfactory and thermosensory inputs as well as with the head-direction system. DNa01 is not characterized to this extent.

      Strengths:

      (1) In-vivo recordings and especially dual recordings are extremely challenging in Drosophila and provide a much higher resolution DN characterization than other recent studies that have relied on behavior or calcium imaging. Especially impressive are the simultaneous recordings from bilateral DNs (Figure 3). These bilateral recordings show clearly that DNa02 cells not only fire more during ipsilateral turning events but that they get inhibited during contralateral turns. In line with this observation, the difference between left and right DNa02 neuronal activity is a much better predictor of turning events compared to individual DNa02 activity.

      (2) Another technical feat in this work is driving local excitation in the head-direction neuronal ensemble (PEN-1 neurons), while simultaneously imaging its activity and performing whole-cell recordings from DNa02 (Figure 4). This impressive approach provided a way to causally relate changes in the head-direction system to DNa02 activity. Indeed, DNa02 activity could predict the rate at which an artificially triggered bump in the PEN-1 ring attractor returns to its previous stable point.

      (3) The authors also support the above observations with connectomics analysis and provide circuit motifs that can explain how the head direction system (as well as external olfactory/thermal stimuli) communicated with DNa02. All these results unequivocally put DNa02 as an essential DN in steering control, both during exploratory navigation as well as stimulus-directed turns.

      Weaknesses:

      (1) I understand that the first version of this preprint was already on biorxiv in 2020, and some of the "weaknesses" I list are likely a reflection of the fact that I'm tasked to review this manuscript in late 2024 (more than 4 years later). But given this is a 2024 updated version it suffers from laying out the results in contemporary terms. For instance, the manuscript lacks any reference to the DNp09 circuit implicated in object-directed turning and upstream to DNa02 even though the authors cite one of the papers where this was analyzed (Braun et al, 2024). More importantly, these studies (both Braun et al 2024 and Sapkal et al 2024) along with recent work from the authors' lab (Yang et al 2023) and other labs (Feng et al 2024) provide a view that the entire suite of leg kinematics changes required for turning are orchestrated by populations of heterogeneous interconnected DNs. Moreover, these studies also show that this DN-DN network has some degree of hierarchy with some DNs being upstream to other DNs. In this contemporary view of steering control, DNa02 (like DNg13 from Yang et al 2023) is a downstream DN that is recruited by hierarchically upstream DNs like DNa03, DNp09, etc. In this view, DNa02 is likely to be involved in most turning events, but by itself unable to drive all the motor outputs required for the said events. This reasoning could be used while discussing the lack of major phenotypes with DNa02 activation or inactivation observed in the current study, which is in stark contrast to strong phenotypes observed in the case of hierarchically upstream DNs like DNp09 or DNa03. In the section, "Contributions of single descending neuron types to steering behavior": the authors start off by asking if individual DNs can make measurable contributions to steering behavior. Once more, any citations to DNp09 or DNa03 - two DNs that are clearly shown to drive strong turning-on activation (Bidaye et al, 2020, Feng et al 2024) - are lacking. Besides misleading the reader, such statements also digress the results away from contemporary knowledge in the field. I appreciate that the brief discussion in the section titled "Ensemble codes for steering" tries to cover these recent updates. However, I think this would serve a better purpose in the introduction and help guide the results.

      (2) The second major weakness is the lack of any immunohistochemistry (IHC) images quantifying the expression of the genetic tools used in these studies. Even though the main split-Gal4 tools for DNa01 and DNa02 were previously reported by Namiki et al, 2018, it is important to document the expression with the effectors used in this work and explicitly mention the expression in any ectopic neurons. Similarly, for any experiments where drivers were combined together (double recordings, functional connectivity) or modified for stochastic expression (Figure 8), IHC images are absolutely necessary. Without this evidence, it is difficult to trust many of the results (especially in the case of behavioral experiments in Figure 8). For example, the DNa01 genetic driver used by the authors is also expressed in some neurons in the nerve cord (as shown on the Flylight webpage of Janelia Research Campus). One wonders if all or part of the results described in Figure 8 are due to DNa01 manipulation or manipulation of the nerve cord neurons. The same applies for optic lobe neurons in the DNa02 driver.

      (3) The paper starts off with a comparative analysis of the roles of DNa01 and DNa02 during steering. Unfortunately, after this initial analysis, DNa01 is largely ignored for further characterization (e.g. with respect to inputs, connectomics, etc.), only to return in the final figure for behavioral characterization where DNa01 seems to have a stronger silencing phenotype compared to DNa02. I couldn't find an explanation for this imbalance in the characterization of DNa01 versus DNa02. Is this due to technical reasons? Or was it an informed decision due to some results? In addition to being a biased characterization, this also results in the manuscript lacking a coherent thread, which in turn makes it a bit inaccessible to the non-specialist.

      (4) There seems to be a discrepancy with regard to what is emphasized in the main text and what is shown in Figures S3/S4 in relation to the role of these DNs in backward walking. There are only two sentences in the main text where these figures are cited.<br /> a) "DNa01 and DNa02 firing rate increases were not consistently followed by large changes in forward velocity (Figs. 1G and S3)."<br /> b) "We found that rotational velocity was consistently related to the difference in right-left firing rates (Fig. 3B). This relationship was essentially linear through its entire dynamic range, and was consistent across paired recordings (Fig. 3C). It was also consistent during backward walking, as well as forward walking (Fig. S4)."<br /> These main text sentences imply the role of the difference between left and right DNa02 in turning. However, the actual plots in the Figures S3 and S4 and their respective legends seem to imply a role in "backward walking". For instance, see this sentence from the legend of Figure S3 "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward. When (firing rateDNa02>>firing rateDNa01), the fly is also often moving backward, but forward movement is still more common overall, and so the net effect is that forward velocity is small but still positive when (firing rateDNa02>>firing rateDNa01). Note that when we condition our analysis on behavior rather than neural activity, we do see that backward walking is associated with a large firing rate differential (Fig. S4)." This sort of discrepancy in what is emphasized in the text, versus what is emphasized in the figures, ends up confusing the reader. More importantly, I do not agree with any of these conclusions regarding the implication of backward walking. Both Figures S3 and S4 are riddled with caveats, misinterpretations, and small sample sizes. As a result, I actually support the authors' decision to not infer too much from these figures in the "main text". In fact, I would recommend going one step further and removing/modifying these figures to focus on the role of "rotational velocity". Please find my concerns about these two figures below:<br /> a) In Figures S3 and S4, every heat map has a different scale for the same parameter: forward velocity. S3A is -10 to +10mm/s. S3B is -6 to +6 S4B (left) is -12 to +12 and S4B (right) is -4 to +4. Since the authors are trying to depict results based on the color-coding this is highly problematic.<br /> b) Figure S3A legend "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward." There are also several instances when ΔvoltageDNa02= ΔvoltageDNa01 and both are low (lower left quadrant) when the fly is typically moving backwards. So in my opinion, this figure in fact suggests DNa02 has no role in backward velocity control.<br /> c) Based on the example traces in S4A, every time the fly walks backwards it is also turning. Based on this it is important to show absolute rotational velocity in Figure S4C. It could be that the fly is turning around the backward peak which would change the interpretation from Figure S4C. Also, it is important to note that the backward velocities in S4A are unprecedentedly high. No previous reports show flies walking backwards at such high velocities (for example see Chen et al 2018, Nat Comm. for backward walking velocities on a similar setup).<br /> d) In my opinion, Figure S4D showing that right-left DNa02 correlates with rotational velocity, regardless of whether the fly is in a forward or backward walking state, is the only important and conclusive result in Figures S3/S4. These figures should be rearranged to only emphasize this panel.

      (5) Figure 3 shows a really nice analysis of the bilateral DNa02 recordings data. While Figure S5 shows that authors have a similar dataset for DNa01, a similar level analysis (Figures 3D, E) is not done for DNa01 data. Is there a reason why this is not done?

      (6) In Figure 4 since the authors have trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded, I wonder if the authors could quantify hyperpolarization in DNa02 as is predicted from connectomics data in Figure 7.

      (7) Figure 6 suggests that DNa02 contains information about latent steering drives. This is really interesting. However, in order to unequivocally claim this, a higher-resolution postural analysis might be needed. Especially given that DNa02 activation does not reliably evoke ipsilateral turning, these "latent" steering events could actually contain significant postural changes driven by DNa02 (making them "not latent"). Without this information, at least the authors need to explicitly mention this caveat.

      (8) Figure 7 would really benefit from connectome data with synapse numbers (or weighted arrows) and a corresponding analysis of DNa01.

      (9) In Figure 8E, the most obvious neuronal silencing phenotype is decreased sideways velocity in the case of DNa01 optogenetic silencing. In Figure S2, the inverse filter for sideways velocity for DNa01 had a higher amplitude than the rotational velocity filter. Taken together, does this point at some role for DNa01 in sideways velocity specifically?

      (10) In Figure 8G, the effect on inner hind leg stance prolongation is very weak, and given the huge sample size, hard to interpret. Also, it is not clear how this fits with the role of DNa01 in slow sustained turning based on recordings.

    5. Author response:

      We thank the reviewers for their feedback. We are currently revising the manuscript to address their questions and concerns. Here we briefly summarize our planned revisions.

      Reviewer 1 requested clarification on three points. We will clarify all these points with text edits. One point is brief enough to be addressed here: in cases when we pooled data from the left and right hemispheres, the reviewer wants to know how this was done. Simply put, we defined the “ipsi” side of the body as the side where the recorded DN resided, and we defined “contra” as the other side.

      Reviewer 2 requested clarification on two minor points. We will clarify these points with text edits and with an additional analysis.

      Reviewer 3 had a number of substantive concerns. Briefly:

      (1) The reviewer asks us to improve its discussion of some relevant literature. We will provide updated information on the DN steering network, and in particular, we will cite Bidaye et al. 2020 and Sapkal et al. 2024. We apologize for the oversight.

      (2) The reviewer asks us for immunofluorescent images documenting the expression patterns of our effector transgenes. With regard to GtACR1::eYPF expression, we will include these images in our resubmission. With regard to ReachR expression, we expressed this reagent stochastically under hs-FLP control, and so different brains had different expression patterns; however, we carefully documented the number of DNa02 cells that expressed ReachR in each brain. With regard to GFP expression, these expression patterns are available online from the FlyLight documentation associated with Namiki et al. eLife 2018 (https://splitgal4.janelia.org/precomputed/Descending%20Neurons%202018.html). The UAS-GFP transgene used by Namiki et al. 2018 (pJFRC200-10XUASIVS-myr::smGFP-HA in attP18) is different from the UAS-GFP transgene we used (10XUAS-IVS-mCD8::GFP(su(Hw)attP8), and so there may be minor differences in expression pattern. However, it should be noted that we only used GFP expression to target somata for patch clamp recording, and DNa01 and DNa02 somata have a distinctive location and a distinctive size; when we performed these recordings, we only targeted a soma in this location, and we verified that there were no “distractor” somata in this vicinity with similar size and appearance. The same applies to patch clamp recordings targeted via Halo7 expression (SiR110-HaloTag fluorescence). In paired recordings from both DNa02 and DN01, we verified the identity of each cell as described in Fig. S1.

      (3) The reviewer asks why we focused on DNa02 in the latter part of the manuscript, rather than DNa01. We made this decision because DNa02 is more highly predictive of steering behavior, as compared to DNa01 (Fig. 1H). Also, an impulse of DNa02 activity is followed by a relatively large turning maneuver, on average, whereas an impulse of DNa01 activity is followed by a relatively small turning maneuver (Fig. 1E-F). Moreover, DNa02 has many more synaptic inputs in the brain (Fig. 7A), and it has many more direct synaptic connections onto motor neurons (Fig. 1B).

      (4) The reviewer highlights difficulties in interpreting DN activity during backward movement (Figs. S3/S4). We included this material in the spirit of completeness, but we agree with the reviewer that it is difficult to interpret. In our revision, we will omit Fig. S3C and Fig. S4A-B, and we will revise these legends to improve clarity.

      (5) The reviewer asks why do a systematic analysis of paired DNa01 recordings, as we did for DNa02. It is difficult to get paired right/left recordings from two DNs of the same type in the same fly, while the fly is walking vigorously, and we were only able to get two such paired recordings from DNa01. We did not feel this was a sufficiently large sample size to support a systematic analysis. We chose not to invest more time in getting more paired DNa01 recordings because we thought that DNa02 was more important, for the reasons noted above.

      (6) The reviewer asks for an analysis of trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded. We will provide this analysis in the revision.

      (7) The reviewer points out that “latent” steering drives might not be latent, as they might produce small postural changes we are not capturing. This is a fair point, and we will note this in our revision.

      (8) The reviewer asks for a systematic analysis of DNa01 inputs in Figure 7, similar to our analysis of DNa02 inputs. Here we would prefer to focus on DNa02, for three reasons. First, we think DNa02 is likely more important, for the reasons noted above. Second, there has been some uncertainty as to the identity of DNa01 in connectome data; indeed, in the hemibrain data set, the cell recently identified as DNa01 was annotated as VES006 (Schlegel et al. Nature 634: 139-152). Third, the cell now identified as DNa01 does not receive direct input from either the central complex or the mushroom body, and for this reason, we felt that the inputs to DNa01 might be less interesting to a general audience.

      (9) The reviewer wonders whether DNa01 is more involved in sideways movement, rather than rotational movement. Our data do not support this conclusion: rather, our data show that DNa01 is only weakly correlated with sideways movement. Thus, the forward filter (Fig. 1F) shows that an impulse of DNa01 activity is (on average) followed by a relatively small amount of sideways movement. Conversely, the reverse filter (in Fig. S2I) shows that an impulse of sideways movement is (on average) preceded by a relatively large amount of DNa01 activity.

      (10) The reviewer points out that the phenotype associated with optogenetic suppression in Fig. 8G is weak. We will highlight this point and discuss potential reasons for this weak phenotype in the revision.

    1. eLife Assessment

      This study presents an important finding on sperm flagellum and HTCA stabilization. The evidence supporting the authors' claims is convincing. The work will be of broad interest to cell and reproductive biologists working on cilium and sperm biology.

    2. Reviewer #1 (Public review):

      In this paper, Wu et al. investigated the physiological roles of CCDC113 in sperm flagellum and HTCA stabilization by using CRISPR/Cas knockouts mouse models, co-IP and single sperm imaging. They find that CCDC113 localizes in the linker region among radial spokes, the nexin-dynein regulatory complex (N-DRC), and doublet microtubules (DMTs) RS, N-DRC and DMTs and interacts with axoneme-associated proteins CFAP57 and CFAP91, acting as an adaptor protein that facilitates the linkage between RS, N-DRC and DMTs within the sperm axoneme. They show the disruption of CCDC113 produced spermatozoa with disorganized sperm flagella and CFAP91, DRC2 could not colocalize with DMTs in Ccdc113-/- spermatozoa. Interestingly, the data also indicate that CCDC113 could localize on the HTCA region, and interact with HTCA-associated proteins. The knockout of Ccdc113 could also produce acephalic spermatozoa. By using Sun5 and Centlein knockout mouse models, the authors further find SUN5 and CENTLEIN are indispensable for the docking of CCDC113 to the implantation site on the sperm head. Overall, the experiments were designed properly and performed well to support the authors' observation in each part. Furthermore, the study's findings offer valuable insights into the physiological and developmental roles of CCDC113 in the male germ line, which can provide insight into impaired sperm development and male infertility. The conclusions of this paper are mostly well supported by data, but some points need to be clarified and discussed.

      (1) In Fig. 1, a sperm flagellum protein, which is far way from CCDC113, should be selected as a negative control to exclude artificial effects in co-IP experiments.<br /> (2) Whether the detachment of sperm head and tail in Ccdc113-/- mice is a secondary effect of the sperm flagellum defects? The author should discuss this point.<br /> (3) Given that some cytoplasm materials could be observed in Ccdc113-/- spermatozoa (Fig. 5A), whether CCDC113 is also essential for cytoplasmic removal?<br /> (4) Although CCDC113 could not bind to PMFBP1, the localization of CCDC113 in Pmfbp1-/- spermatozoa should be also detected to clarify the relationship between CCDC113 and SUN5-CENTLEIN-PMFBP1.

      Comments on revisions:

      The authors addressed all my concerns. The manuscript was greatly improved.

    3. Reviewer #2 (Public review):

      Summary:

      In the present study, the authors select the coiled-coil protein CCDC113 and revealed its expression in the stages of spermatogenesis in the testis as well as in the different steps of spermiogenesis with expression also mapped in the different parts of the epididymis. Gene deletion led to male infertility in CRISPR-Cas9 KO mice and PAS staining showed defects mapped in the different stages of the seminiferous cycle and through the different steps of spermiogenesis. EM and IF with several markers of testis germ cells and spermatozoa in the epididymis indicated defects in flagella and head-to-tail coupling for flagella as well as acephaly. The authors' co-IP experiments of expressed CCDC113 in HEK293T cells indicated an association with CFAP91 and DRC2 as well as SUN5 and CENTLEIN.

      The authors propose that CCDC113 connects CFAP91 and DRC2 to doublet microtubules of the axoneme and CCDC113's association with SUN5 and CENTLEIN to stabilize the sperm flagellum head-to-tail coupling apparatus. Extensive experiments mapping CCDC13 during postnatal development are reported as well as negative co-IP experiments and studies with SUN5 KO mice as well as CENTLEIN KO mice.

      Strengths:

      The authors provide compelling observations to indicate the relevance of CCDC113 to flagellum formation with potential protein partners. The data are relevant to sperm flagella formation and its coupling to the sperm head.

      Weaknesses:

      The authors' observations are consistent with the model proposed but the authors' conclusions for the mechanism may require direct demonstration in sperm flagella. The Walton et al paper shows human CCDC96/113 in cilia of human respiratory epithelia. An application of such methodology to the proteins indicated by Wu et al for the sperm axoneme and head-tail coupling apparatus is eagerly awaited as a follow-up study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      This study presents a valuable finding on sperm flagellum and HTCA stabilization. The evidence supporting the authors' claims is incomplete. The work will be of broad interest to cell and reproductive biologists working on cilium and sperm biology.

      We thank the Editor and the two reviewers for their time and thorough evaluation of our manuscript. We greatly appreciate their valuable guidance on improving our study. In the revised manuscript, we have conducted additional experiments and provided quantitative data in response to the reviewers' comments. Furthermore, we have refined the manuscript and added further context to elucidate the significance of our findings for the readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, Wu et al. investigated the physiological roles of CCDC113 in sperm flagellum and HTCA stabilization by using CRISPR/Cas knockouts mouse models, co-IP, and single sperm imaging. They find that CCDC113 localizes in the linker region among radial spokes, the nexin-dynein regulatory complex (N-DRC), and doublet microtubules (DMTs) RS, N-DRC, and DMTs and interacts with axoneme-associated proteins CFAP57 and CFAP91, acting as an adaptor protein that facilitates the linkage between RS, N-DRC, and DMTs within the sperm axoneme. They show the disruption of CCDC113 produced spermatozoa with disorganized sperm flagella and CFAP91, DRC2 could not colocalize with DMTs in Ccdc113-/- spermatozoa. Interestingly, the data also indicate that CCDC113 could localize on the HTCA region, and interact with HTCA-associated proteins. The knockout of Ccdc113 could also produce acephalic spermatozoa. By using Sun5 and Centlein knockout mouse models, the authors further find SUN5 and CENTLEIN are indispensable for the docking of CCDC113 to the implantation site on the sperm head. Overall, the experiments were designed properly and performed well to support the authors' observation in each part. Furthermore, the study's findings offer valuable insights into the physiological and developmental roles of CCDC113 in the male germ line, which can provide insight into impaired sperm development and male infertility. The conclusions of this paper are mostly well supported by data, but some points need to be clarified and discussed.

      We thank Reviewer #1 for his or her critical reading and the positive assessment.

      (1) In Figure 1, a sperm flagellum protein, which is far away from CCDC113, should be selected as a negative control to exclude artificial effects in co-IP experiments.

      We greatly appreciate Reviewer #1’s insightful suggestion. In response, we selected two sperm outer dense fiber proteins, ODF1 and ODF2, which are located distant from the sperm axoneme, as negative controls in the co-IP experiments. As shown in Figure 1- figure supplement 1A and B, neither ODF1 nor ODF2 bound to CCDC113, indicating the interaction observed in Figure 1 is not an artifact.

      (2) Whether the detachment of sperm head and tail in Ccdc113-/- mice is a secondary effect of the sperm flagellum defects? The author should discuss this point.

      Good question. Considering that CCDC113 is localized in the sperm neck region and interacts with SUN5 and CENTLEIN, it may play a direct role in connecting the sperm head and tail. Indeed, PAS staining revealed that Ccdc113–/– sperm heads exhibit abnormal orientation in stages V–VIII of the seminiferous epithelia (Figure 6C-D). Furthermore, transmission electron microscopy (TEM) analysis indicated that the absence of CCDC113 caused detachment of the damaged coupling apparatus from the sperm head in step 9–11 spermatids (Figure 6E). These results suggest that the detachment of the sperm head and tail in Ccdc113–/– mice may not be a secondary effect of sperm flagellum defects. We have discussed this point further below:

      “CCDC113 can interact with SUN5 and CENTLEIN, but not PMFBP1 (Figure 7A-C), and left on the tip of the decapitated tail in Sun5–/– and Centlein–/– spermatozoa (Figure 7K and L). Furthermore, CCDC113 colocalizes with SUN5 in the HTCA region, and immunofluorescence staining in spermatozoa shows that SUN5 is positioned closer to the sperm nucleus than CCDC113 (Figure 7G and H). Therefore, SUN5 and CENTLEIN may be closer to the sperm nucleus than CCDC113. PAS staining revealed that Ccdc113–/– sperm heads are abnormally oriented in stages V–VIII seminiferous epithelia (Figure6 C and D), and TEM analysis further demonstrated that the disruption of CCDC113 causes the detachment of the destroyed coupling apparatus from the sperm head in step 9–11 spermatids (Figure 6E). All these results suggest that the detachment of sperm head and tail in Ccdc113–/– mice may not be a secondary effect of sperm flagellum defects.”

      (3) Given that some cytoplasm materials could be observed in Ccdc113-/- spermatozoa (Fig. 5A), whether CCDC113 is also essential for cytoplasmic removal?

      Good question. Unremoved cytoplasm could be detected in spermatozoa by using transmission electron microscopy (TEM) analysis, including disrupted mitochondria, damaged axonemes, and large vacuoles. These observations indicate defects in cytoplasmic removal in Ccdc113–/– mice. We have discussed this point as below:

      “Moreover, TEM analysis detected excess residual cytoplasm in spermatozoa, including disrupted mitochondria, damaged axonemes, and large vacuoles, indicating defects in cytoplasmic removal in Ccdc113–/– mice (Figure 5A).”

      (4) Although CCDC113 could not bind to PMFBP1, the localization of CCDC113 in Pmfbp1-/- spermatozoa should be also detected to clarify the relationship between CCDC113 and SUN5-CENTLEIN-PMFBP1.

      We appreciate Reviewer #1’s suggestion. We have analyzed the localization of CCDC113 in Pmfbp1-/- spermatozoa and found that CCDC113 was located at the tip of the decapitated tail in Pmfbp1-/- spermatozoa (Figure 7K and L). This finding has been incorporated into the revised manuscript as below:

      “To further elucidate the functional relationships among CCDC113, SUN5, CENTLEIN, and PMFBP1 at the sperm HTCA, we examined the localization of CCDC113 in Sun5-/-, Centlein–/–, and Pmfbp1–/– spermatozoa. Compared to the control group, CCDC113 was predominantly localized on the decapitated flagellum in Sun5-/-, Centlein–/–, and Pmfnp1–/– spermatozoa (Figure 7K and L), indicating SUN5, CENTLEIN, and PMFBP1 are crucial for the proper docking of CCDC113 to the implantation site on the sperm head. Taken together, these data demonstrate that CCDC113 cooperates with SUN5 and CENTLEIN to stabilize the sperm HTCA and anchor the sperm head to the tail.”

      Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors select the coiled-coil protein CCDC113 and revealed its expression in the stages of spermatogenesis in the testis as well as in the different steps of spermiogenesis with expression also mapped in the different parts of the epididymis. Gene deletion led to male infertility in CRISPR-Cas9 KO mice and PAS staining showed defects mapped in the different stages of the seminiferous cycle and through the different steps of spermiogenesis. EM and IF with several markers of testis germ cells and spermatozoa in the epididymis indicated defects in flagella and head-to-tail coupling for flagella as well as acephaly. The authors' co-IP experiments of expressed CCDC113 in HEK293T cells indicated an association with CFAP91 and DRC2 as well as SUN5 and CENTLEIN.

      The authors propose that CCDC113 connects CFAP91 and DRC2 to doublet microtubules of the axoneme and CCDC113's association with SUN5 and CENTLEIN to stabilize the sperm flagellum head-to-tail coupling apparatus. Extensive experiments mapping CCDC13 during postnatal development are reported as well as negative co-IP experiments and studies with SUN5 KO mice as well as CENTLEIN KO mice.

      Strengths:

      The authors provide compelling observations to indicate the relevance of CCDC113 to flagellum formation with potential protein partners. The data are relevant to sperm flagella formation and its coupling to the sperm head.

      We are grateful to Reviewer #2 for his or her recognition of the strength of this study.

      Weaknesses:

      The authors' observations are consistent with the model proposed but the authors' conclusions for the mechanism may require direct demonstration in sperm flagella. The Walton et al paper shows human CCDC96/113 in cilia of human respiratory epithelia. An application of such methodology to the proteins indicated by Wu et al for the sperm axoneme and head-tail coupling apparatus is eagerly awaited as a follow-up study.

      We thank Reviewer 2 for his/her kindly help in improving the manuscript.  We now understand that directly detection of CCDC113 precise localization in sperm axoneme and head-tail coupling apparatus (HTCA) using cryo-electron microscopy (cryo-EM) could powerfully strengthen our model. Recent advances in cryo-EM have indeed advanced our understanding of axonemal structures analysis of axonemal structures and determined the structures of native axonemal DMTs from mouse, bovine, and human sperm (Leung et al., 2023; Zhou et al., 2023). However, high-resolution structures of sperm axoneme and HTCA regions, including those involving CCDC113, have yet to be fully characterized. Thus, we would like to discuss this point and consider it a valuable direction for future research.

      “Given that the cryo-EM of sperm axoneme and HTCA could powerfully strengthen the role of CCDC113 in stabilizing sperm axoneme and head-tail coupling apparatus, it a valuable direction for future research.”

      References:

      Bazan, R., Schröfel, A., Joachimiak, E., Poprzeczko, M., Pigino, G., & Wloga, D. (2021). Ccdc113/Ccdc96 complex, a novel regulator of ciliary beating that connects radial spoke 3 to dynein g and the nexin link. PLoS Genet, 17(3), e1009388.

      Ghanaeian, A., Majhi, S., McCafferty, C. L., Nami, B., Black, C. S., Yang, S. K., Legal, T., Papoulas, O., Janowska, M., Valente-Paterno, M., Marcotte, E. M., Wloga, D., & Bui, K. H. (2023). Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism. Nat Commun, 14(1), 5741.

      Leung, M. R., Zeng, J., Wang, X., Roelofs, M. C., Huang, W., Zenezini Chiozzi, R., Hevler, J. F., Heck, A. J. R., Dutcher, S. K., Brown, A., Zhang, R., & Zeev-Ben-Mordehai, T.  (2023). Structural specializations of the sperm tail. Cell, 186(13), 2880-2896.e2817

      Walton, T., Gui, M., Velkova, S., Fassad, M. R., Hirst, R. A., Haarman, E., O'Callaghan, C., Bottier, M., Burgoyne, T., Mitchison, H. M., & Brown, A. (2023). Axonemal structures reveal mechanoregulatory and disease mechanisms. Nature, 618(7965), 625-633.

      Zhou, L., Liu, H., Liu, S., Yang, X., Dong, Y., Pan, Y., Xiao, Z., Zheng, B., Sun, Y., Huang, P., Zhang, X., Hu, J., Sun, R., Feng, S., Zhu, Y., Liu, M., Gui, M., & Wu, J. (2023). Structures of sperm flagellar doublet microtubules expand the genetic spectrum of male infertility. Cell, 186(13), 2897-2910.e2819.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please provide full gel for the Figure 2C experiment (could be as a supplementary file).

      Thanks for your insightful suggestions. We have replaced Figure 2C and provided the full gel in Figure 2-figure supplement 1A.

      (2) The authors write on Line 163 "In contrast, the flagellum staining appeared reduced in Ccdc113-/- seminiferous tubules (Fig. 2J, red asterisk)." However, the magnification of the pictures is not sufficient to distinguish anything in the panel mentioned, please provide others.

      Many thanks for pointing this out. We have provided the iconic figure to show the flagella defect in seminiferous tubules.

      (3) Please add statistical p-values for figures.

      Thanks for your valuable advice. We have added statistical p-values to the figures in the revised manuscript.

      (4) Line 128: Should "speculate" be "speculated"?

      Thank you for pointing out this problem. We have corrected it in the revised manuscript, as shown below:

      “Given that CFAP91 has been reported to stabilize RS on the DMTs (Bicka et al., 2022; Dymek et al., 2011; Gui et al., 2021) and cryo-EM analysis shows that CCDC113 is closed to DMTs, we speculated that CCDC113 may connect RS to DMTs by binding to CFAP91 and microtubules.”

      (5) In lines 384-385, more "-" is typed.

      Thank you for pointing out this problem. We have corrected it in the revised manuscript, as shown below:

      “Furthermore, CCDC113 colocalizes with SUN5 in the HTCA region, and immunofluorescence staining in spermatozoa shows that SUN5 is closer to the sperm nucleus than CCDC113 (Figure 7G and H). Therefore, SUN5 and CENTLEIN may be closer to the sperm nucleus than CCDC113.”

      (6) In general, the article has many typos and should be professionally proofread.

      Many thanks for pointing this out. We have thoroughly revised the manuscript with the assistance professional proofreading.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors indicate in the Materials and Methods if n=3 biological replicates were done for all co-IP, EM, LM, and IF studies? The statistical analysis section indicates this but quantification is missing for most figures including co-IP, most IF, PAS staining, EM, etc.

      We thank Reviewer 2 for the insightful comments and guidance to improve our data quality. All the experiments in this study were repeated at least three times to ensure reproducibility. We have quantified the co-IP experiments in Figures 1C-H and 7A-F, the IF data in Figures 2K, 5C, and 5D, as well as the PAS staining in Figure 6C. Since electron microscopy samples require very little testicular tissue and the sections obtained are very thin, the likelihood of capturing sections specifically at the sperm head-tail junction is considerably low. This challenge makes it difficult to perform quantitative analysis and statistical evaluation in the TEM experiment. To address this limitation, we have quantified the percentage of _Ccdc113-/-_sperm heads with abnormal orientation in stages V–VIII of the seminiferous epithelium to indicate impaired head-to-tail anchorage.

      Figure S2 is compelling and might be indicated as a major figure instead of a supplementary figure.

      We appreciate the positive comment. We have included it as a major figure in Figure 3F.

      Figure 4A may be incomplete. Data sets for RNA expression suggest high expression in the ovary and other organs in males and females including the brain and are not indicated by the authors. Figure 4A may be considered for removal with a more complete study for another paper.

      Thank you for pointing out this issue. We reviewed RNA expression data from various tissues using RNA-Seq data from Mouse ENCODE (https://www.ncbi.nlm.nih.gov/gene/244608) and found that CCDC113 is highly expressed in the testis, but not significantly in the ovary and brain (Figure 4- figure supplement 1A). Additionally, we re-evaluated CCDC113 protein levels in the spleen, lung, kidney, testis, intestine, stomach, brain, and ovary, confirming that it is highly expressed in the testes, with negligible expression in the ovary and brain (Figure 4- figure supplement 1B). In line with Reviewer 2's suggestion, we have removed Figure 4A in the revised manuscript.

      There are grammatical errors throughout the manuscript and Figure 7 is truncated.

      Thank you for pointing out this problem. We have thoroughly revised the manuscript with the assistance professional proofreading.

      The Introduction and Discussion parts of the paper may need some clarification for the general reader. The material in the "Additional Context " section of the critique below may be a helpful place to introduce what a stage is, and the steps in germ cell development in the testis with the latter of course where and when the flagellum develops.

      We appreciate your valuable suggestions. We have referred to the material in the “Additional Context” section to introduce the stages of spermatogenesis and the steps in germ cell development in the testis in the introduction and results.

      “Male fertility relies on the continuous production of spermatozoa through a complex developmental process known as spermatogenesis. Spermatogenesis involves three primary stages: spermatogonia mitosis, spermatocyte meiosis, and spermiogenesis. During spermiogenesis, spermatids undergo complex differentiation processes to develop into spermatozoa, which includes nuclear elongation, chromatin remodeling, acrosome formation, cytoplasm elimination, and flagellum development (Hermo et al., 2010).”

      Hermo, L., Pelletier, R. M., Cyr, D. G., & Smith, C. E. (2010). Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microscopy research and technique, 73(4), 241–278. https://doi.org/10.1002/jemt.20783

      “Pioneering work in the mid-1950s used the PAS stain in histologic sections of mouse testis to visualize glycoproteins of the acrosome and Golgi in seminiferous tubules (Oakberg, 1956). The pioneers discovered in cross-sectioned seminiferous tubules the association of differentiating germ cells with successive layers to define different stages that in mice are twelve, indicated as Roman numerals (XII). For each stage, different associations of maturing germ cells were always the same with early cells in differentiation at the periphery and more mature cells near the lumen. In this way, progressive differentiation from stem cells to mitotic, meiotic, acrosome-forming, and post-acrosome maturing spermatocytes was mapped to define spermatogenesis with the XII stages in mice representing the seminiferous cycle. The maturation process from acrosome-forming cells to mature spermatocytes is defined as spermiogenesis with 16 different steps that are morphologically distinct spermatids (O'Donnell L, 2015).”

      Oakberg, E. F. (1956). A description of spermiogenesis in the mouse and its use in analysis of the cycle of the seminiferous epithelium and germ cell renewal. The American journal of anatomy, 99(3), 391-413. https://doi.org/10.1002/aja.1000990303

      O'Donnell L. (2015). Mechanisms of spermiogenesis and spermiation and how they are disturbed. Spermatogenesis, 4(2), e979623. https://doi.org/10.4161/21565562.2014.979623

      For the Discussion, the authors indicate that the function of CCDC113 in mammals is unknown yet the authors point to the work of Walton et al on human respiratory epithelia that points to a function for CCDC96/113. The work in the manuscript here does indicate a role in sperm flagella and the head-to-tail coupling apparatus but remains descriptive until the methodology of Walton et al is applied. Hopefully, the authors will consider it for a follow-up study.

      Thank you for pointing out this problem. We have revised this part and highlighted the Walton et al’s work in the Discussion.

      “CCDC113 is a highly evolutionarily conserved component of motile cilia/flagella. Studies in the model organism, Tetrahymena thermophila, have revealed that CCDC113 connects RS3 to dynein g and the N-DRC, which plays essential role in cilia motility (Bazan et al., 2021; Ghanaeian et al., 2023). Recent studies have also identified the localization of CCDC113 within the 96-nm repeat structure of the human respiratory epithelial axoneme, and localizes to the linker region among RS, N-DRC and DMTs (Walton et al., 2023). In this study, we reveal that CCDC113 is indispensable for male fertility, as Ccdc113 knockout mice produce spermatozoa with flagellar defects and head-tail linkage detachment (Figure 3D).”

      “Overall, we identified CCDC113 as a structural component of both the flagellar axoneme and the HTCA, where it performs dual roles in stabilizing the sperm axonemal structure and maintaining the structural integrity of HTCA. Given that the cryo-EM of sperm axoneme and HTCA could powerfully strengthen the role of CCDC113 in stabilizing sperm axoneme and head-tail coupling apparatus, it a valuable direction for future research.”

      The Discussion may be focused on the key aspects of CCDC113 related to sperm flagella and the head-to-tail coupling apparatus that represent a genuine advance. The more speculative parts of the Discussion that have not been addressed by experimentation in the Results section may be considered for removal in the Discussion section.

      Thank you for pointing out this. We have removed the speculative parts of the Discussion that have not been addressed by experimentation in the Results section.

      Additional Context to help readers understand the significance of the work:

      Pioneering work in the mid-1950s used the periodic acid Schiff (PAS) stain in histologic sections of rodent testis to visualize glycoproteins of the acrosome and Golgi in seminiferous tubules. The pioneers discovered in cross-sectioned seminiferous tubules the association of differentiating germ cells with successive layers to define different stages that in mice are twelve, indicated as Roman numerals (XII). For each stage, different associations of maturing germ cells were always the same with early cells in differentiation at the periphery and more mature cells near the lumen. In this way, progressive differentiation from stem cells to mitotic, meiotic, acrosome-forming, and post-acrosome maturing spermatocytes was mapped to define spermatogenesis with the XII stages in mice representing the seminiferous cycle. The maturation process from acrosome-forming cells to mature spermatocytes is defined as spermiogenesis with 19 different steps that are morphologically distinct spermatids. It is from steps 8-19 of spermiogenesis that the formation of the flagellum takes place. Final maturation occurs in the epididymis as sperm move through the caput, corpus, and cauda of the organ with motile spermatozoa generated.

      Thank you very much!

    1. eLife Assessment

      This valuable study investigates the oscillatory activity of gonadotropin-releasing hormone (GnRH) neurones in mice using GCaMP fiber photometry. It demonstrates three distinct patterns of oscillatory activity that occur in GnRH neurons comprising low-level rapid baseline activity, abrupt short-duration oscillations that drive pulsatile gonadotropin secretion, and, in females, a gradual and prolonged oscillating increase in activity responsible for the relatively short-lived preovulatory LH surge. The evidence presented in the study is solid, offering theoretical implications for understanding the behaviour of GnRH neurones in the context of reproductive physiology, and will be of interest to researchers in neuroendocrinology and reproductive biology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the oscillatory activity of GnRH neurones in freely behaving mice. By utilising GCaMP fiber photometry, they sought to record real-time neuronal activity to understand the patterns and dynamics of GnRH neuron firing and their implications for reproductive physiology.

      Strengths:

      - The use of GCaMP fiber photometry allows for high temporal resolution recordings of neuronal activity, providing real-time data on the dynamics of GnRH neurones.<br /> - Recording in freely behaving animals ensures that the findings are physiologically relevant and not artifacts of a controlled laboratory environment.<br /> - The authors used statistical methods to characterise the oscillatory patterns, ensuring the reliability of their findings.

      Weaknesses:

      - While the study identifies distinct oscillatory patterns in GnRH neurones' calcium dynamics, it falls short in exploring the functional implications of these patterns for GnRH pulsatility and overall reproductive physiology.<br /> - The study lacks broader discussion to include comparisons with existing studies on GnRH neurone activity and pulsatility and highlight how the findings of this study align with or differ from previous research and what novel contributions are made.<br /> - The authors aimed to characterise the oscillatory activity of GnRH neurons and successfully identified distinct oscillatory patterns. The results support the conclusion that GnRH neurons exhibit complex oscillatory behaviours, which are critical for understanding their role in reproductive physiology. However, it has not been made clear what exactly do the authors mean by "multi-dimensional oscillatory patterns" and how has this been shown.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors report GCaMP fiber-photometry recordings from the GnRH neuron distal projections in the ventral arcuate nucleus. The recording are taken from intact, male and female, freely behaving mice. The report three patterns of neuronal activity:

      1) abrupt increases in the Ca2+ signals that are perfectly correlated with LH pulses.

      2) a gradual, yet fluctuating (with a slow ultradian frequency), increase in activity, which is associated with the onset of the LH surge in female animals.

      3) clustered (high frequency) baseline activity in both female and male animals.

      Strengths:

      The GCaMP fiber-photometry recordings reported here are the first direct recordings from GnRH neurones in free behaving mice. These recordings suggest a rich repertoire of activity, including the integration of distinct "surge" and "pulse" generation signals, and an ultradian rhythm during the onset of the surge.

      Weaknesses:

      The data analysis methods used for the characterisation of the oscillatory behaviour could be complemented with more advanced wavelet methods to quantify and analyse how the frequency content of the observed Ca2+ signal changes over the cycle.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to investigate the oscillatory activity of GnRH neurones in freely behaving mice. By utilising GCaMP fiber photometry, they sought to record real-time neuronal activity to understand the patterns and dynamics of GnRH neuron firing and their implications for reproductive physiology.

      Strengths:

      (1) The use of GCaMP fiber photometry allows for high temporal resolution recordings of neuronal activity, providing real-time data on the dynamics of GnRH neurones.

      (2) Recording in freely behaving animals ensures that the findings are physiologically relevant and not artifacts of a controlled laboratory environment.

      (3) The authors used statistical methods to characterise the oscillatory patterns, ensuring the reliability of their findings.

      Weaknesses:

      (1) While the study identifies distinct oscillatory patterns in GnRH neurones' calcium dynamics, it falls short in exploring the functional implications of these patterns for GnRH pulsatility and overall reproductive physiology.

      The functional roles of pulsatile and surge patterns of GnRH release are extremely well established. We have found perfect correlations between GnRH neuron dendron GCaMP activity and LH pulses as well as the LH surge clearly indicating the function of these activity patterns. We do not know the functional role of the clustered high-frequency basal activity that we have discovered and, as noted in the Discussion, are unsure of its physiological importance. Although it may be minor, it will require future investigation.

      (2) The study lacks a broader discussion to include comparisons with existing studies on GnRH neurone activity and pulsatility and highlight how the findings of this study align with or differ from previous research and what novel contributions are made.

      The Reviewer fails to recognise that these are first recordings of GnRH neurons in vivo. There are no prior studies for comparison. We have noted the only other in vivo study (undertaken by ourselves) many years ago in anaesthetized mice. It was never expected that electrophysiological recordings of GnRH neurons in acute brain slices (by ourselves and others) would reflect their activity in vivo. Now that we know this to be the case, it would be churlish to point this out explicitly. We have made some modifications to the Discussion by comparing the present data more thoroughly with other in vivo GnRH secretion and kisspeptin neuron activity studies.

      (3) The authors aimed to characterise the oscillatory activity of GnRH neurons and successfully identified distinct oscillatory patterns. The results support the conclusion that GnRH neurons exhibit complex oscillatory behaviours, which are critical for understanding their role in reproductive physiology. However, it has not been made clear what exactly the authors mean by "multi-dimensional oscillatory patterns" and how has this been shown.

      The study shows three types of GnRH neuron activity; two of which would be classified as oscillatory in nature and these show different temporal dimensions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors report GCaMP fiber-photometry recordings from the GnRH neuron distal projections in the ventral arcuate nucleus. The recordings are taken from intact, male and female, freely behaving mice. The report three patterns of neuronal activity:

      (1) Abrupt increases in the Ca2+ signals that are perfectly correlated with LH pulses.

      (2) A gradual, yet fluctuating (with a slow ultradian frequency), increase in activity, which is associated with the onset of the LH surge in female animals.

      (3) Clustered (high frequency) baseline activity in both female and male animals.

      Strengths:

      The GCaMP fiber-photometry recordings reported here are the first direct recordings from GnRH neurones in vivo. These recordings have uncovered a rich repertoire of activity suggesting the integration of distinct "surge" and "pulse" generation signals, and an ultradian rhythm during the onset of the surge.

      Weaknesses:

      The data analysis method used for the characterisation of the ultradian rhythm observed during the onset of the surge is not detailed enough. Hence, I'm left wondering whether this rhythm is in any way correlated with the clusters of activity observed during the rest of the cycle and which have similar duration.

      We have provided further information on the characterisation of the ultradian rhythm observed at the time of the surge. Whether this is related to the clustered basal activity is an interesting point but very difficult to resolve. We note that the “basal” and “surge” ultradian oscillations have very different durations of ~30 and ~80 min suggesting that they may be independent phenomenon. However, the only way to really exclude a similar genesis will be to establish the origin of each type of oscillatory activity. Preliminary data in the lab show that the RP3V kisspeptin neurons exhibit an identical pattern of ultradian oscillation at the time of the surge leading us to suspect that the surge oscillation is driven by this input. As noted in the Discussion it is presently difficult to determine where the high basal activity originates.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Evidence of Multi-Dimensional Oscillatory Patterns: The manuscript presents data showing the oscillatory activity of GnRH neurones with distinct frequency and amplitude characteristics. The analysis includes statistical tests that illustrate the variability in neuronal firing patterns. However, the multi-dimensional nature of this activity has not been demonstrated. It is not clear what is meant by "dimension" with regard to the calcium recordings (oscillatory activity). If the authors refer to the frequency content of the calcium signal then a proper Fourier or Wavelet analysis should be carried out to characterise the multiple frequencies present in the calcium dynamics in male mice and during various stages of the cycle in female mice

      The study shows three types of GnRH neuron activity; two of which would be classified as oscillatory in nature. One occurs for ~10 min every hour or so and the other occurs for ~ 12 hours once every 4-5 days. This does not require any analysis to distinguish between the two or claim that they are different i.e. multidimensional. 

      (2) Data Interpretation: Expand the discussion on the physiological relevance of the identified oscillatory patterns. Specifically, explore how these patterns might influence GnRH pulsatility, hormone secretion dynamics, and reproductive cycles.

      The functional roles of pulsatile and surge patterns of GnRH release are extremely well established. We have found perfect correlations between GnRH neuron dendron GCaMP activity and LH pulses as well as the LH surge clearly indicating the function of these activity patterns. We do not know the functional role of the clustered high-frequency basal activity that we have discovered and, as noted in the Discussion, are unsure of its physiological importance. Although it may be minor, it will require future investigation.

      (3) Literature Contextualisation: Broaden the discussion to include comparisons with existing studies on GnRH neuron activity and pulsatility. Highlight how the findings of this study align with or differ from previous research and what novel contributions are made.

      The Reviewer fails to recognise that these are first recordings of GnRH neurons in vivo. There are no prior studies for comparison. We have noted the only other in vivo study (undertaken by ourselves) many years ago in anaesthetized mice. It would be naive to expect that electrophysiological recordings of GnRH neurons in acute brain slices (by ourselves and others) would reflect their activity in vivo. Now that we know this to be the case, it would be churlish to point this out explicitly. We have made some modifications to the Discussion by comparing the present data more thoroughly with other in vivo GnRH secretion and kisspeptin neuron activity studies.

      (4) Future Directions: Suggest potential follow-up experiments to explore the regulatory mechanisms underlying the observed oscillatory patterns. This could include investigating the role of neurotransmitters, hormonal feedback mechanisms, and other factors that might influence GnRH neuron activity.

      By addressing these recommendations, the authors can further strengthen their manuscript and enhance its impact on the field.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions:

      (1) The authors might want to analyse their inter-peak interval data by fitting them to a simple parametric statistical model (the gamma distribution would be a good choice to capture the skewness of these data). This way they would be able to describe the observed variability, and if the fits are not good back up to their claims "The dSEs occurred on average ... and showed no clear modal distribution pattern (Fig. 2D)".

      Thank you for the suggestion. We have carried out Shapiro-Wilk tests for male inter-peak interval distribution and found a W value of 0.87 and P value <0.0001****, providing strong evidence that the data is not normally distributed. Skewness and Kurtosis values are 1.39 and 1.81 respectively, indicating that the distribution is right-skewed with a platykurtic distribution, indicating that the data is less peaked and more spread out than the normal distribution (with a kurtosis of 3). This has now been added to the manuscript.

      (2) If I understand correctly, in Figure 3D, inter-peak intervals from all 4 stages of the estrus cycle are pooled together. It would also be interesting if the authors gave the interval histograms for the different stages of the cycle separately.

      We have now plotted the inter-peak interval distribution histograms for each individual cycle next to the example traces in Figure 3. The descriptions of the distribution pattern are also updated in the figure legends.

      (3) In Figure 3C, one can see the mean interval for different animals (as open circles), is that right? Is the statistical test run on these animals mean, or is the entire dSEs dataset used? In any case, it's not clear to the reader how variable intervals are in individual recordings from each animal. Could the authors add this information (could be easily added in the figure caption)?

      The reviewer is correct, that each open circle is the mean interval for each animal. The statistical test was run on the animals mean. Now this information is added to the figure legend.

      (4) The authors should explain how they identify the regions (clusters) of high-frequency baseline activity, which they present in Figure 4.

      The relevant information is now added to the methods section under the heading ‘GCaMP6 fiber photometry and blood sampling’.

      (5) The authors should detail how to identify and characterise the ultradian rhythm they observe at the onset of the surge.

      The relevant information is now added to the methods section under the heading ‘GCaMP6 fiber photometry and blood sampling’.

      (6) The author could perform some kind of wavelet-type analysis to quantify and analyse how the frequency content of the observed Ca2+ signal changes over the cycle. From their current analysis, I am not sure whether the ultradian oscillations they observe during the surge are related to the low-activity cluster events they observe during the other stages of the cycle.

      This is an interesting point but very difficult to resolve. We note that the “basal” and “surge” ultradian oscillations have very different durations of ~30 and ~80 min suggesting that they may be independent phenomenon. However, the only way to really exclude a similar genesis will be to establish the origin of each type of oscillatory activity. Preliminary data in the lab show that the RP3V kisspeptin neurons exhibit an identical pattern of ultradian oscillation at the time of the surge leading us to suspect that the surge oscillation is driven by this input. As noted in the Discussion it is presently difficult to determine where the high basal activity originates.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer’s comments

      We are most grateful for the opportunity to address the reviewer comments. Point-by-point responses are presented below.

      Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using computational reasonable tools, and having an in-depth discussion of the significant results.

      We thank the reviewer for the very supportive comments.

      Based on the comments and questions, we have grouped the concerns and corresponding responses into three categories.

      (1) The scope and data selection

      The results are somewhat inconclusive or not validated.

      The overall results are carefully designed, but most of the results are descriptive. While the authors are able to find additional evidence either from the literature or explain the results with their existing knowledge, none of the results have been biologically validated. Especially, the last three result sections (signaling pathways, eQTLs, and TF binding) further extended their findings, but the authors did not put the major results into any of the figures in the main text.”

      The goal of this manuscript is to provide a list of putative childhood obesity target genes to yield new insights and help drive further experimentation. Moreover, the outputs from signaling pathways, eQTLs, and TF binding, although noteworthy and supportive of our method, were not particularly novel. In our manuscript we placed our focus on the novel findings from the analyses. We did, however, report the part of the eQTLs analysis concerning ADCY3, which brought new insight to the pathology of obesity, in Figure 4C.

      The manuscript would benefit from an explanation regarding the rationale behind the selection of the 57 human cell types analyzed. it is essential to clarify whether these cell types have unique functions or relevance to childhood development and obesity.

      We elected to comprehensively investigate the GWAS-informed cellular underpinnings of childhood development and obesity. By including a diverse range of cell types from different tissues and organs, we sought to capture the multifaceted nature of cellular contributions to obesity-related mechanisms, and open new avenues for targeted therapeutic interventions.

      There are clearly cell types that are already established as being key to the pathogenesis of obesity when dysregulated: adipocytes for energy storage, immune cell types regulating inflammation and metabolic homeostasis, hepatocytes regulating lipid metabolism, pancreatic cell types intricately involved in glucose and lipid metabolism, skeletal muscle for glucose uptake and metabolism, and brain cell types in the regulation of appetite, energy expenditure, and metabolic homeostasis.

      While it is practical to focus on cell types already proven to be associated with or relevant to obesity, this approach has its limitations. It confines our understanding to established knowledge and rules out the potential for discovering novel insights from new cellular mechanisms or pathways that could play significant roles in the pathogenesis if obesity. Therefore, it was essential to reflect known biology against the unexplored cell types to expand our overall understanding and potentially identify innovative targets for treatment or prevention.

      I wonder whether the used epigenome datasets are all from children. Although the authors use literature to support that body weight and obesity remain stable from infancy to adulthood, it remains uncertain whether epigenomic data from other life stages might overlook significant genetic variants that uniquely contribute to childhood obesity.

      The datasets utilized in our study were derived from a combination of sources, both pediatric and adult. We recognize that epigenetic profiles can vary across different life stages but our principal effort was to characterize susceptibility BEFORE disease onset.

      Given that the GTEx tissue samples are derived from adult donors, there appears to be a mismatch with the study's focus on childhood obesity. If possible, identifying alternative validation strategies or datasets more closely related to the pediatric population could strengthen the study's findings.

      We thank the reviewer for raising this important point. We acknowledge that the GTEx tissue samples are derived from adult donors, which might not perfectly align with the study's focus on childhood obesity. The ideal strategy would be a longitudinal design that follows individuals from childhood into adulthood to bridge the gap between pediatric and adult data, offering systematic insights into how early-life epigenetic markers influencing obesity later in life. In future work, we aim to carry out such efforts, which will represent substantial time and financial commitment.

      Along the same lines, the Developmental Genotype-Tissue Expression (dGTEx) Project is a new effort to study development-specific genetic effects on gene expression at 4 developmental windows spanning from infant to post-puberty (0-18 years). Donor recruitment began in August 2023 and remains ongoing. Tissue characterization and data production are underway. We hope that with the establishment of this resource, our future research in the field of pediatric health will be further enhanced.

      Figure 1B: in subplots c and d, the results are either from Hi-C or capture-C. Although the authors use different colors to denote them, I cannot help wondering how much difference between Hi-C and capture-C brings in. Did the authors explore the difference between the Hi-C and capture-C?

      Thank you for your comment. It is not within the scope of our paper to explore the differences between the Hi-C and Capture-C methods. In the context of our study, both methods serve the same purpose of detecting chromatin loops that bring putative enhancers to sometimes genomically distant gene promoters. Consequently, our focus was on utilizing these methods to identify relevant chromatin interactions rather than comparing their technical differences.

      (2) Details on defining different categories of the regions of interest

      Some technical details are missing.

      While the authors described all of their analysis steps, a lot of the time, they did not mention the motivation. Sometimes, the details were also omitted.”

      We have added a section to the revision to address the rationale behind different OCRs categories.

      Line 129: should "-1,500/+500bp" be "-500/+500bp"?

      A gene promoter was defined as a region 1,500 bases upstream to 500 bases downstream of the TSS. Most transcription factor binding sites are distributes upstream (5’) from TSS, and the assembly of transcription machinery occurs up to 1000 bases 5’ from TSS. Given our interest in SNPs that can potentially disrupt transcription factor binding, this defined promoter length allowed us to capture such SNPs in our analyses.

      How did the authors define a contact region?

      Chromatin contact regions identified by Hi-C or Capture-C assays are always reported as pairs of chromatin regions. The Supplementary eMethods provide details on the method of processing and interaction calling from the Hi-C and Capture-C data.

      The manuscript would benefit from a detailed explanation of the methods used to define cREs, particularly the process of intersecting OCRs with chromatin conformation data. The current description does not fully clarify how the cREs are defined.

      In the result section titled "Consistency and diversity of childhood obesity proxy variants mapped to cREs", the authors introduced the different types of cREs in the context of open chromatin regions and chromatin contact regions, and TSS. Figure 2A is helpful in some way, but more explanation is definitely needed. For example, it seems that the authors introduced three chromatin contacts on purpose, but I did not quite get the overall motivation.

      We apologize for the confusion. Our definition of cREs is consistent throughout the study. Figure 2A will be the first Figure 1A in the revision in order to aid the reader.

      The 3 representative chromatin loops illustrate different ways the chromatin contact regions (pairs of blue regions under blue arcs) can overlap with OCRs (yellow regions under yellow triangles – ATAC peaks) and gene promoters.

      (1) The first chromatin loop has one contact region that overlaps with OCRs at one end and with the gene promoter at the other. This satisfies the formation of cREs; thus, the area under the yellow ATAC-peak triangle is green.

      (2) The second loop only overlapped with OCR at one end, and there was no gene promoter nearby, so it is unqualified as cREs formation.

      (3) The third chromatin loop has OCR and promoter overlapping at one end. We defined this as a special cRE formation; thus, the area under the yellow ATAC-peak triangle is green.

      To avoid further confusion for the reader, we have eliminated this variation in the new illustration for the revised manuscript.

      Figure 2A: The authors used triangles filled differently to denote different types of cREs but I wonder what the height of the triangles implies. Please specify.

      The triangles are illustrations for ATAC-seq peaks, and the yellow chromatin regions under them are OCRs. The different heights of ATAC-seq peaks are usually quantified as intensity values for OCRs. However, in our study, when an ATAC-seq peak passed the significance threshold from the data pipeline, we only considered their locations, regardless of their intensities. To avoid further confusion for the reader, we have eliminated this variation in the new illustration for the revised manuscript.

      Figure 1B-c. the title should be "OCRs at putative cREs". Similarly in Figure 1B-d.

      cREs are a subset of OCRs.

      - In the section "Cell type specific partitioned heritability", the authors used "4 defined sets of input genomic regions". Are you corresponding to the four types of regions in Figure 2A? 

      Figure 2A is the first Figure 1A in the revision and is modified to showcase how we define OCRs and cREs.

      It seems that the authors described the 771 proxies in "Genetic loci included in variant-to-genes mapping" (ln 154), and then somehow narrowed down from 771 to 94 (according to ln 199) because they are cREs. It would be great if the authors could describe the selection procedure together, rather than isolated, which made it quite difficult to understand.

      In the Methods section entitled “Genetic loci included in variant-to-genes mapping," we described the process of LD expansion to include 771 proxies from 19 sentinel obesity-significantly associated signals. Not all of these proxies are located within our defined cREs. Figure 2B, now Figure 2A in the revision, illustrates different proportions of these proxies located within different types of regions, reducing the proxy list to 94 located within our defined cREs.

      Figure 2. What's the difference between the 771 and 758 proxies?

      13 out of 771 proxies did not fall within any defined regions. The remaining 758 were located within contact regions of at least one cell type regardless of chromatin state.

      (3) Typos

      In the paragraph "Childhood obesity GWAS summary statistics", the authors may want to describe the case/control numbers in two stages differently. "in stage 1" and "921 cases" together made me think "1,921" is one number.

      This has been amended in the revision.

      Hi-C technology should be spelled as Hi-C. There are many places, it is miss-spelled as "hi-C". In Figure 1, the author used "hiC" in the legend. Similarly, Capture-C sometime was spelled as "capture-C" in the manuscript.

      At the end of the fifth row in the second paragraph of the Introduction section: "exisit" should be "exist".

      In Figure 2A: "Within open chromatin contract region" should be "Within open chromatin contact region”

      These typos and terminology inconsistencies have been amended in the revision.

    2. eLife Assessment

      This important study presents genome-wide high-resolution chromatin-based 3D genomic interaction maps for over 50 diverse human cell types and integrates these data with pediatric obesity GWAS. The work provides convincing evidence that multiple pancreatic islet cell types are key effector cell types. The authors also perform variant-to-gene mapping to nominate genes underlying several GWAS hits. Overall, the results will be of interest to both the fields of 3D genome architecture and pediatric obesity.

    3. Joint Public Reviews:

      Summary:

      This paper studies the genetic factors contributing to childhood obesity. Through a comprehensive analysis integrating genome-wide association study (GWAS) data with 3D genomic datasets across 57 human cell types, consisting of Capture-C/Hi-C, ATAC-seq, and RNA-seq, the study identifies significant genetic contributions to obesity using stratified LD score regression, emphasizing the enrichment of genetic signals in pancreatic alpha cells and identification of significant effector genes at obesity-associated loci such as BDNF, ADCY3, TMEM18, and FTO. Additionally, the study implicated ALKAL2, a gene responsive to inflammation in nerve nociceptors, as a novel effector gene at the TMEM18 locus, suggesting a role for inflammatory and neurological pathways in obesity's pathogenesis which was supported through colocalization analysis using eQTL derived from the GTEx dataset. This comprehensive genomic analysis sheds light on the complex genetic architecture of childhood obesity, highlighting the importance of cellular context for future research and the development of more effective strategies.

      Strengths:

      Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using appropriate computational tools, and in-depth discussion of their significant results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Zhang et al. report a genetic screen to identify novel transcriptional regulators that could coordinate mitochondrial biogenesis. They performed an RNAi-based modifier screen wherein they systematically knocked down all known transcription factors in the developing Drosophila eye, which was already sensitised and had decreased mitochondrial DNA content. Through this screen, they identify CG1603 as a potential regulator of mitochondrial content. They show that protein levels of mitochondrial proteins like TFAM, SDHA, and other mitochondrial proteins and mtDNA content are downregulated in CG1603 mutants. RNA-Seq and ChIP-Seq further show that CG1603 binds to the promoter regions of several known nuclear-encoded mitochondrial genes and regulates their expression. Finally, they also identified YL-1 as an upstream regulator of CG1603. Overall, it is a very important study as our understanding of the regulation of mitochondrial biogenesis remains limited across metazoans. Most studies have focused on PGC-1α as a master regulator of mitochondrial biogeneis, which seems a context-dependent regulator. Also, PGC-1α mediated regulation could not explain the regulation of 1100 genes that are required for mitochondrial biogenesis. Therefore, identifying a new regulator is crucial for understanding the overall regulation of mitochondrial biogenesis.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors aim to identify the nuclear genome-encoded transcription factors that regulate mtDNA maintenance and mitochondrial biogenesis. They started with an RNAi screening in developing Drosophila eyes with reduced mtDNA content and identified a number of putative candidate genes. Subsequently, using ChIP-seq data, they built a potential regulatory network that could govern mitochondrial biogenesis. Next, they focused on a candidate gene, CG1603, for further characterization. Based on the expression of different markers, such as TFAM and SDHA, in the RNAi and OE clones in the midgut cells, they argue that CG1603 promotes mitochondrial biogenesis and the expression of ETC complex genes. Then, they used a mutant of CG1603 and showed that both mtDNA levels and mitochondrial protein levels were reduced. Using clonal analyses, they further show a reduction in mitochondrial biogenesis and membrane potential upon loss of CG1603. They made a reporter line of CG1603, showed that the protein is localized to the mitochondria, and binds to polytene chromosomes in the salivary gland. Based on the RNA-seq results from the mutants and the ChIP data, the authors argue that the nucleus-encoded mitochondrial genes that are downregulated >2 folds in the CG1603 mutants and that are bound by CG1603 are related to ETC biogenesis. Finally, they show that YL-1, another candidate in the network, is an upstream regulator of CG1603.

      Strengths:

      This is a valuable study, which identifies a potential regulator and a network of nucleus-encoded transcription factors that regulate mitochondrial biogenesis. Through in-vivo and in-vitro experimental evidence, the authors identify the role of CG1603 in this process. The screening strategy was smart, and the follow-up experiments were nicely executed.

      Weaknesses:

      Some additional experiments showing the effects of CG1603 loss on ETC integrity and functionality would strengthen the work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Fig 3F: SDHA levels are severely downregulated in CG1603 RNAi clones. Therefore, estimating mitochondrial volume based on the SDHA reporter might be misleading. I suggest the authors perform this experiment with an independent marker of mitochondria, like mitoTracker Green or other dyes. I also suggest checking for mitochondrial number/quantity/size by electron microscopy.

      Even though being downregulated, the SDHA-mNeon signal in EC clones clearly outlined mitochondria and the overall mitochondrial network, allowing us to quantify the total mitochondrial volume. Examining mitochondrial number/quantity/size by electron microscopy would further strengthen this statement, and we will consider it in future studies.

      (2) The authors might comment on whether there was any decrease in the volume of CG1603i clone cells. And whether this was taken into account while normalising the mitochondrial volume.

      The size/volume of CG1603i clone cells were indeed decreased, which was considered while normalizing the mitochondrial volume. We clarified this point in methods section (page 18, line 511-512 (revised version page 18, line 515-517)).

      (3) Line 230-234: Collectively, these results demonstrate that CG1603 promotes the expression of both nuclear and mtDNA-encoded ETC genes and boosts mitochondrial biogenesis. CG1603 RNAi produced very few EC clones, consistent with the notion that mitochondrial respiration is necessary for ISCs differentiation.

      (4) Quantifying the number of EC clone cells observed might help support this statement.

      This is a great point. We quantified the number of EC clone cells, and the data was included in the revised Figure 3—figure supplement.

      (5) Figure 5: The intensity of MTGreen in CH1603 clones seems comparable to that in control cells, at least visually. Since the authors claim a reduction in mitochondrial volume in CG1603 mutants, it is crucial to estimate mitochondrial volume based on MTGreen intensity in mutant and control cells.

      There are two types of clones shown in Figure 5:  germ cell clones including all 16 germ cells in the same egg chamber and follicle cell clones. We highlight these two types of clones in the revised Figure 5, to emphasize this point. The total MT Green intensity in both germ cell and follicle cell CG1603PBac clones were reduced, compared to germ cells in adjacent egg chambers and adjacent follicle cells in the same egg chamber, respectively. We included the quantification of MTGreen intensity in the revised Figure 5—figure supplement C. Examining mitochondrial number/quantity/size by electron microscopy would further strengthen this statement, and we will consider it in future studies.

      (6) Figure 8: It would be interesting to know what happens to steady-state mtDNA levels during YL-1 knockdown. If decreased, could overexpressing CG1603 in YL-1 knockdown cells rescue the phenotype?

      YL-1 knockdown reduced steady-state mtDNA levels in eyes, and overexpressing CG1603 restored mtDNA level in YL-1 knockdown cells. These results are included in the revised Figure 8-figure supplement C.

      Minor comments:

      (7) The paper is lucidly written, but there are minor typos in several places. The authors might proofread it to remove these errors.

      We corrected typos and other minor errors in the manuscript.

      (8) Quantification for Figure 8 - Supplementary needs to be included.

      We performed the quantification, and the result is shown in Figure 8—figure supplement B.

      Reviewer #2 (Recommendations For The Authors):

      (1) In lines 275-276 and Figure 6E, the authors mention that more than 800 nuclear-encoded mitochondrial genes were reduced by >2-folds in CG1603 mutants. One gene related to mitochondrial replication and three genes related to mtDNA transcription were among them. Was TFAM one of these candidates? What were the reduction levels of TFAM mRNA in RNA seq results? Can the author confirm it via RT-PCR?

      In RNAseq analyses, TFAM was differentially expressed with a log2 Fold-Change of “ -0.74”, corresponding to ~1.6-fold decrease, and hence was not one of these candidates that were down-regulated more than two folds in CG1603 mutant. Per reviewer’s suggestion, we carried out RT-PCR and found TFAM was downregulated about 2-fold in CG1603 mutant. We included this result in the revised Figure 6F and listed all differentially expressed genes in Supplementary file 5a.

      (2) In many places, the authors argued about the role of CG1603 in ETC biogenesis. Also, the RNA-seq data shows that 64 genes related to the ETC complex were reduced by > 2-fold in CG1603 mutant. Therefore, it would be critical to expand a little on this aspect. For example, what are these genes and related to which of the ETC complex? Can the authors show the reduced levels of some of the candidate genes from each complex via RT-PCR?

      We listed all ETC genes that were down-regulated more than two folds in CG1603 mutant in a separate sheet in Supplementary file 5b. We further validated the reduced expression of ETC genes by RT-PCR on three randomly selected candidate genes from each complex. The result is included in the revised Figure 6F.

      (3) To make their argument solid on the role of CG1603 on ETC biogenesis, it is important to show the assembly/integrity of ETC complexes as well as the functionality/activity of the ETC complexes in CG1603 mutants.

      We purified mitochondria, and assayed assembly/integrity of three ETC complexes (Complex I, II and IV) and their activities, using blue native PAGE analysis and in gel activity analysis, respectively.  The amount of these three complexes, and accordingly, their activities were all markedly reduced in CG1603 mutant compared to wt.  The result is included as Figure 4—figure supplement A.

      (4) CG1603 has already been named as cliff. Why do the authors not use this name, or alternatively propose one?

      We thank the reviewer for the note. The CG1603 has not been named as cliff when we were preparing this manuscript.

      (5) In lines 230-231, based on the TFAM-GFP and SDHA-mNG levels, the authors claim that "these results demonstrate that CG1603 promotes the expression of both nuclear and mtDNA-encoded ETC genes..." The authors may tone down this statement since it sounds overstating. It would be prudent to claim that a subset of genes are regulated by CG1603.

      We appreciate the reviewer’s suggestion. We revised the text to tone down this statement (page 8, line 201; page 9, line 229-230).

    2. eLife Assessment

      This study's findings substantially advance our understanding of an important aspect of mitochondrial metabolism. The data are compelling and the study is well executed. The work is relevant to all who are interested in the biogenesis of mitochondria.

    3. Reviewer #1 (Public review):

      In this manuscript, Zhang et al. report a genetic screen to identify novel transcriptional regulators that coordinate mitochondrial biogenesis. They performed an RNAi-based modifier screen wherein they systematically knocked down all known transcription factors in the developing Drosophila eye, which was sensitized and had decreased mitochondrial DNA content. Through this screen, they identify CG1603 as a potential regulator of mitochondrial volume. They show that protein levels of mitochondrial proteins like TFAM, SDHA, and other mitochondrial proteins and mtDNA content are downregulated in CG1603 mutants. RNA-Seq and ChIP-Seq further show that CG1603 binds to the promoter regions of several known nuclear-encoded mitochondrial genes and regulates their expression. Finally, they also identified YL-1 as an upstream regulator of CG1603. Most studies have focused on PGC-1α as a master regulator of mitochondrial biogenesis. which seems to be a context-dependent regulator. Also, PGC-1α mediated regulation does not explain the regulation of 1100 genes that are required for mitochondrial biogenesis. Therefore, identifying new regulators in this work is crucial for the advancement of our understanding of mitochondrial biogenesis.

    4. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identified nuclear genome-encoded transcription factors that regulate mtDNA maintenance and mitochondrial biogenesis. They started with an RNAi screening in developing Drosophila eyes with reduced mtDNA content and identified several putative candidate genes. Subsequently, using ChIP-seq data, they built a potential regulatory network that seems to govern mitochondrial biogenesis. Next, they focused on a candidate gene, CG1603 /clifford, for further characterization. Based on the expression of different markers, such as TFAM and SDHA, in RNAi and overexpression clones in the midgut, they argued that CG1603 promotes mitochondrial biogenesis and the expression of ETC complex genes. They used a CG1603 mutant to show reduced mtDNA and mitochondrial protein levels. Clonal analyses showed a reduction in mitochondrial biogenesis and membrane potential upon loss of CG1603. They further showed that the protein is localized to the mitochondria, and binds to polytene chromosomes in the salivary gland. Based on the RNA-seq results from the mutants and the ChIP data, the authors argued that the nucleus-encoded mitochondrial genes are downregulated >2 folds in the CG1603 mutants and that the regulatory elements bound by CG1603 are related to ETC biogenesis. Finally, they showed that YL-1, another candidate in the network, is an upstream regulator of CG1603. The screening strategy was well-designed, and the follow-up experiments were nicely executed.

      Comments on revisions:

      The authors have addressed my previous comments satisfactorily.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      However, given that S1P is upstream NF-κB signaling, it is unclear if it offers conceptual innovations as compared to previous studies from the same team (Palazzo et al. 2020; 2022, 2023)

      We find distinct differences between the impacts of S1P- and NFkB-signaling on glial activation, neuronal differentiation of the progeny of MGPCs and neuronal survival in damaged retinas. In the current study we demonstrate that 2 consecutive daily intravitreal injections of S1P selectively activated mTor (pS6) and Jak/Stat3 (pStat3), but not MAPK (pERK1/2) signaling in Müller glia.  Further, inhibition of S1P synthesis (SPHK1 inhibitor) decreased ATF3, mTor (pS6) and pSmad1/5/9 levels in activated Müller glia in damaged retinas. Inhibition of NFkB-signaling in damaged chick retinas did not impact the above-mentioned cell signaling pathways (Palazzo et al., 2020). Thus, S1P-signaling impacts cell signaling pathways in MG that are distinct from NFκB, but we cannot exclude the possibility of cross-talk between NFkB and these pathways. Further, inhibition of NFκB-signaling potently decreases numbers of dying cells and increases numbers of surviving ganglion cells (Palazzo et al 2020). Consistent with these findings, a TNF orthologue, which presumably activates NFκB-signaling, exacerbates cell death in damage retinas (Palazzo et al., 2020). By contrast, 5 different drugs targeting S1P-signaling had no effect on numbers of dying cells and only one S1PR1 inhibitor modestly decreased numbers of dying cells (current study). In addition, inhibition of NFκB does not influence the neurogenic potential of MGPCs in damaged chick retinas (Palazzo et al., 2020), whereas inhibition of S1P receptors (S1PR1 and S1PR3) and inhibition of S1P synthesis (SPHK1) significantly increased the differentiation of amacrine-like neurons in damaged retinas (current study). Collectively, in comparison to the effects of pro-inflammatory cytokines and NFκB-signaling, our current findings indicate that S1P-signaling through S1PR1 and S1PR3 in Müller glia has distinct effects upon cell signaling pathways, neuronal regeneration and cell survival in damaged retinas. We will revise text in the Discussion to better highlight these important distinctions between NFκB- and S1P-signaling.

      Reviewer #2:

      Weaknesses:

      The methodology is not very clean. A number of drugs (inhibitors/ antagonists/agonists signal modulators) are used to modulate S1P expression or signaling in the retina without evidence that these drugs are reaching the target cells. No alternative evaluation if the drugs, in fact, are effective. The drug solubility in the vehicle and in the vitreous is not provided, and how did they decide on using a single dose of each drug to have the optimal expected effect on the S1P pathway?

      Müller glia are the predominant retinal cell type that expresses S1P receptors. Consistent with these patterns of expression, we report Müller glia-specific effects of different agonists and antagonists that increase or decrease S1P-signaling. Since we compare cell-level changes within contralateral eyes wherein one retina is exposed to vehicle and the other is exposed to vehicle plus drug, it seems highly probable that the drugs are eliciting effects upon the Müller glia. It is possible, but very unlikely, that the responses we observed could have resulted from drugs acting on extra-retinal tissues, which might secondarily release factors that elicit cellular responses in Müller glia. However, this seems unlikely given the distinct patterns of expression for different S1P receptors in Müller glia, and the outcomes of inhibiting Sphk1 or S1P lyase on retinal levels of S1P.

      For example, we provide evidence that S1PR1 and S1PR3 expression is predominant in Müller glia in the chick retina using single cell-RNA sequencing and fluorescence in situ hybridization (FISH). Thus, we expect that S1PR1/3-targeting small molecule inhibitors to directly act on Müller glia, which is consistent with our read-outs of cell signaling with injections of S1P in undamaged retinas. We show that SPHK1 and SGPL1, which encode the enzymes that synthesize or degrade S1P, are expressed by different retinal cell types, including the Müller glia. The efficacy of the drugs that target SPHK1 and SGPL1 was assessed by measuring levels of S1P in the retina. By using liquid chromatography and tandem mass spectroscopy (LC-MS/MS), we provide data that inhibition of S1P synthesis (inhibition of SPHK1) significantly decreased levels of S1P in normal retinas, whereas inhibition of S1P degradation (inhibition of SGPL1) increased levels of S1P in damaged retinas (Fig. 5).  These data suggest that the SPHK1 inhibitor and the SGPL1 inhibitor specifically act at the intended target to influence retinal levels of S1P.  Further, inhibition of SPHK1 (to decrease levels S1P) results in decreased levels of ATF3, pS6 (mTor) and pSMAD1/5/9 in Müller glia, consistent with the notion that reduced levels of S1P in the retina impacts signaling at Müller glia. Finally, we find similar cellular responses to chemically different agonists or antagonists, and we find opposite cellular responses to agonists and antagonists, which are expected to be complimentary if the drugs are specifically acting at the intended targets in the retina. We will revise the Discussion to better address caveats and concerns regarding the actions and specificity of different drugs within the retina following intravitreal delivery.

      We will provide the drug solubility specifications and estimates of the initial maximum dose per eye for each drug. For chick eyes between P7 and P14, these estimates will assume a volume of about 100 µl of liquid vitreous, 800 µl gel vitreous and an average eye weight of 0.9 grams. We will revise Table 1 (pharmacological compounds) with ranges of reported in vivo ED50’s (mg/kg) for drugs and we will list the calculated initial maximum dose (mg/kg equivalent per eye). Doses were chosen based on estimates of the initial maximum ocular dose that were within the range of reported ED50’s. However, as is the case for any in vivo model system, it is difficult to predict rates of drug diffusion out of the vitreous, how quickly the drugs are cleared from the entire eye, how much of the compound enters the retina, and how quickly the drug is cleared from the retina. Accordingly, we assessed drug specificity and sites of activation by relying upon readouts of cell signaling pathways, parsed with S1P receptor expression patterns, together with measurements of retinal levels of S1P following exposure to drugs targeting enzymes that catalyze synthesis or degradation of S1P, as described above.

    2. eLife Assessment

      This valuable study investigates the signaling pathways regulating retina regeneration. Solid evidence shows that the sphingosine-1-phosphate (S1P) signaling pathway is inhibited following retinal injury. Small-molecule activators and inhibitors support a model in which S1P signaling must be inhibited to generate Müller glia progenitor cells-a key step in retinal regeneration. The presented results support the major conclusions. However, the methodology concerning drug treatments is unclear, and the conceptual innovation is, to some extent, incremental.

    3. Reviewer #1 (Public review):

      Summary:

      This study shows that the pro-inflammatory S1P signaling regulates the responses of muller glial cells to damage. The authors describe the expression of S1P signaling components. Using agonist and antagonist of the pathways they also investigate their effect on the de-differentiation and proliferation of Muller glial cells in damaged retina of postnatal chicks. They show that S1PR1 is highly expressed in resting MG and non-neurogenic MGPCs. This receptor suppresses the proliferation and neuronal activity promotes MGPC cell cycle re-entry and enhanced the number of regenerated amacrine-like cells after retinal damage. The formation of MGPCs in damaged retinas is impaired in the absence of microglial cells. This study further shows that ablation of microglial cells from the retina increases the expression of S1P-related genes in MG, whereas inhibition of S1PR1 and SPHK1 partially rescues the formation of MGPCs in damaged retinas depleted of microglia. The studies also show that expression of S1P-related genes is conserved in fish and human retinas.

      Strengths:

      This is well-conducted study, with convincing images and statistically relevant data

      Weaknesses:

      However, given that S1P is upstream N NF-κB signaling, it is unclear if it offers conceptual innovations as compared to previous studies from the same team (Palazzo et al. 2020; 2022, 2023)

    4. Reviewer #2 (Public review):

      Summary:

      Sphingosine-1-phosphate (S1P) metabolic and signaling genes are expressed highly in retinal Müller glia (MG) cells. This study tested how S1P signaling regulates glial phenotype, dedifferentiation of, reprogramming into proliferating MG-derived progenitor cells (MGPCs), and neuronal differentiation of the progeny of MGPCs using in vivo chick retina. Major techniques used are Sc-RNASeq and immunohistochemistry to determine the gene expression and proliferation of MG cells that co-label with signaling antibodies or mRNA FISH following treating the in vivo eyes with various S1P signaling antagonists, agonists, and signal modulators. The major conclusions drawn are supported by the results presented. However, the methodology they have used to modulate the S1P pathway using various chemical drugs raises questions about the outcomes and whether those are the real effects of S1P receptor modulation or S1P synthesis inhibition.

      Strengths:

      - Use of elaborated single-cell RNAseq expression data.<br /> - Use of FISH for S1P receptors and kinase as a good quality antibody is not available.<br /> - Use of EdU assay in combination with IHC<br /> - Comparison with human and Zebrafish Sc-RNA data

      Weaknesses:

      The methodology is not very clean. A number of drugs (inhibitors/ antagonists/agonists signal modulators) are used to modulate S1P expression or signaling in the retina without evidence that these drugs are reaching the target cells. No alternative evaluation if the drugs, in fact, are effective. The drug solubility in the vehicle and in the vitreous is not provided, and how did they decide on using a single dose of each drug to have the optimal expected effect on the S1P pathway?

    1. eLife Assessment

      This is a useful contribution to our understanding of taste perception. The idea that specific receptors function in the pharynx to mediate responses to carboxylic acids is interesting, although the expression analysis is incomplete. Reviewers also have a number of other suggestions for improvement, including the request that authors provide more details about the methodology used. In general, the claims are supported by solid evidence and add to a growing body of literature on this topic.

    2. Reviewer #1 (Public review):

      Summary:

      Shrestha et al report an investigation of mechanisms underlying gustatory preference for carboxylic acids in Drosophila. They begin with a screen of selected IR mutants, identifying 5 candidates - 2 IR co-receptors and 3 other IRs - whose loss of function causes defects in feeding preference for one or more of the three tested carboxylic acids. The requirement for IR51b, IR94a, and IR94h in carboxylic acid responses is evaluated in more detail using behavior, electrophysiology (labellar sensilla), and calcium imaging (pharyngeal neurons). The behavioral valence of IR94a and IR94h neurons is assessed using optogenetics. Overall the study uses a variety of approaches to test and validate the requirement of IRs in pharyngeal carboxylic acid taste.

      Strengths:

      The involvement of the identified IRs in gustatory responses to carboxylic acids is very clear from this study. The authors use mutants and transgenic rescue experiments and evaluate outcomes using electrophysiology, behavior, and imaging. Complementary approaches of loss-of-function and artificial activation support the main conclusion that the identified pharyngeal neurons sense carboxylic acids and convey a positive behavioral valence.

      Weaknesses:

      Some aspects of expression analysis and calcium imaging need to be clarified to better support the conclusions.

      (1) The conclusion of two parallel IR-mediated pathways rests on expression analysis of Ir94a-GAL4 and Ir94h-GAL4 lines and the observation that Ir51b expression driven by either can rescue the Ir51b mutant phenotype. However, the expression analysis is not as rigorous as it needs to be for such a conclusion. Prior work found co-expression of Ir94a and Ir94h in the LSO. Here, the co-expression of the two drivers has not been examined, and Ir94a-GAL4 does not appear to be expressed in the LSO. Given the challenges in validating expression patterns in pharyngeal organs, the possibility that the drivers do not entirely capture endogenous expression cannot be ruled out. Rescue experiments using feeding preference or single-cell imaging don't suffice as validation. Plus, the expression of Ir51b could not be defined.

      (2) The description of methods and results for the ex vivo calcium imaging is not satisfactory. Details about which cells are being analyzed, and in which organs are not included. No solvent stimulus is tested. The temporal dynamics of the responses are not presented. Movies of the imaging are not included as supplementary information - it would be important to visualize those with what was considered modest movement.

      (3) The observed differences in phenotypes of Ir25a and Ir76b mutants are intriguing, as are those between the co-receptor mutants and Ir51b, Ir94a, and Ir94h, but have not been sufficiently considered. Prior studies have also found roles for other response modes (OFF response), other IRs and GRs, and other organs (labellum, tarsi) in behavioral responses to carboxylic acids. Overall, the authors' model may be overly simplistic, and the discussion does not do justice to how their model reconciles with the body of work that already exists.

    3. Reviewer #2 (Public review):

      Shrestha et al investigated the role of IR receptors in the detection of 3 carboxylic acids in adult Drosophila. A low concentration of either of these carboxylic acids added to 2 mM sucrose (1% lactic acid (LA), citric acid (CA), or glycolic acid (GA)) stimulates the consumption of adult flies in choice conditions. The authors use this behavioral test to screen the impact of mutations within 33 receptors belonging to the IR family, a large family of receptors derived from glutamate receptors and expressed both in the olfactory and gustatory sensilla of insects. Within the panel of mutants tested, they observed that 3 receptors (IR25a, IR51b, and IR76b) impaired the detection of LA, CA, and GA, and that 2 others impacted the detection of CA and GA (IR94a and IR94h). Interestingly, impairing IR51b, IR94a, and IR94h did not affect the electrophysiological responses of external gustatory sensilla to LA, CA, and GA. Thanks to the use of GAL4 strains associated with these receptors and thanks to the use of poxn mutants (which do not develop external gustatory sensilla but still have functional internal receptors), they show evidence that IR94a and IR94h are only expressed in two clusters of gustatory neurons of the pharynx, respectively in the VCSO (ventral cibarial sense organ) and in the VCSO + LSO (labral sense organ). As for IR51b, the GAL4 approach was not successful but RT-PCR made on different parts of the insect showed an expression both in the pharyngeal organs and in peripheral receptors. These main findings are then complemented by a host of additional experiments meant to better understand the respective roles of IR94a and IR94h, by using optogenetics and brain calcium imaging using GCamp6. They also report a failed attempt to co-express IR51b, IR94a, and IR94h into external receptors, a co-expression which did not confer the capability of bitter-sensitive cells (expressing GR33a-GAL4) to detect either of the carboxylic acids. These data complete and expand previous observations made on this group and others, and dot to 2 new IR receptors which show an unsuspected specific expression, into organs that still remain difficult to study.

      The conclusions of this paper are supported by the data presented, but it remains difficult to make general conclusions as concerns the mechanisms by which carboxylic acids are detected.

      (1) All experiments were done with 1% of carboxylic acids. What is the dose dependency of the behavioral responses to these acids, and is it conceivable that other receptors are involved at other concentrations?

      (2) One result needs to be better discussed and hypotheses proposed - which is why the mutations of most receptors lead to a loss of detection (mutant flies become incapable of detecting the acid) while mutations in IR94a and IR94h make CA and GA potent deterrents. Does it mean that CA and GA are detected by another set of receptors that, when activated, make flies actively avoid CA and GA? In that case, do the authors think that testing receptors one by one is enough to uncover all the receptors participating in the detection of these substances?

      (3) The paper needs to be updated with a recent paper published by Guillemin et al (2024), indicating that LA is detected externally by a combination of IR94e, IR76b and IR25a. IR25a might help to form a fully functional receptor in GR33a neurons (a former study from Chen et al (2017) indicate that IR25a is expressed in all gustatory neurons of the pharynx).

      (4) Although it was not the main focus of the paper, it would have been most interesting if the cells expressing IR94a and IR94h were identified, and placed on the functional map proposed by the group of Dahanukar (Chen et al 2017 Cell Reports, Chen et al 2019 Cell Reports).

    4. Reviewer #3 (Public review):

      Summary:

      In this work, the authors investigated the molecular and cellular basis of sour taste perception in Drosophila melanogaster, focusing on identifying receptors that mediate attractive responses to certain carboxylic acids. It builds on previous work from the same group that had identified the IR co-receptors IR25a and IR76b for this sensory process, screening a set of mutants in IRs to identify three, IR51b, IR94a, and IR94h, required for feeding preference responses to some or all of the tested acids.

      Strengths:

      The work is of interest because it assigns sensory roles to IRs of previously unknown function, in particular IR94a and IR94h, and points to pharyngeal neurons in which these receptors are expressed as the relevant sensory neurons (potentially with different roles for IR94a- and IR94h-expressing neurons). The work combines elegant genetics, simple but effective feeding and taste assays, chemo-/opto-genetic activation, and some calcium imaging. Overall the presented data look solid and well-controlled.

      Weaknesses:

      The in situ expression analysis relies entirely on transgenic driver lines for IR94a and IR94h (which had been previously described, though not fully cited in this work). Importantly, given that many of the behavioral experiments (genetic rescue, physiology, artificial activation) use the IR94a and IR94h GAL4 driver lines, it would be helpful to validate that these faithfully reflect IR94a and IR94h expression (as far as I can tell, such validation wasn't done in the original papers describing these lines as part of a large collection of IR drivers). For IR51b, pharyngeal expression is concluded indirectly from non-quantitative RT-PCR analysis (genetic reporters did not work). The lack of direct detection of gene/protein expression (for example, through RNA FISH, immunofluorescence, or protein tagging) would have made for a more complete characterization of these receptors (for example, there is no direct evidence that they also express IR25a and IR76b, as one might expect). Finally, the relationship of IR94a and IR94h neurons to other types of pharyngeal neurons remains unclear, as are their projection patterns in the SEZ.

      Conceptually, the work is of interest mostly to those in the immediate field; there have been a very large number of studies in the past decade (several from this lab) characterizing the contributions of different IRs to various chemosensory processes. The current work doesn't lend much insight into the nature of the minimal functional unit of gustatory IRs (reconstitution of a functional IR in a heterologous neuron/cell has not been achieved here, but this is a limitation of many other previous studies), nor to how different pharyngeal sensory pathways might collaborate to control behavior. Nevertheless, the findings provide a useful contribution to the literature.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Shrestha et al report an investigation of mechanisms underlying gustatory preference for carboxylic acids in Drosophila. They begin with a screen of selected IR mutants, identifying 5 candidates - 2 IR co-receptors and 3 other IRs - whose loss of function causes defects in feeding preference for one or more of the three tested carboxylic acids. The requirement for IR51b, IR94a, and IR94h in carboxylic acid responses is evaluated in more detail using behavior, electrophysiology (labellar sensilla), and calcium imaging (pharyngeal neurons). The behavioral valence of IR94a and IR94h neurons is assessed using optogenetics. Overall the study uses a variety of approaches to test and validate the requirement of IRs in pharyngeal carboxylic acid taste.

      Strengths:

      The involvement of the identified IRs in gustatory responses to carboxylic acids is very clear from this study. The authors use mutants and transgenic rescue experiments and evaluate outcomes using electrophysiology, behavior, and imaging. Complementary approaches of loss-of-function and artificial activation support the main conclusion that the identified pharyngeal neurons sense carboxylic acids and convey a positive behavioral valence.

      Weaknesses:

      Some aspects of expression analysis and calcium imaging need to be clarified to better support the conclusions.

      (1) The conclusion of two parallel IR-mediated pathways rests on expression analysis of Ir94a-GAL4 and Ir94h-GAL4 lines and the observation that Ir51b expression driven by either can rescue the Ir51b mutant phenotype. However, the expression analysis is not as rigorous as it needs to be for such a conclusion. Prior work found co-expression of Ir94a and Ir94h in the LSO. Here, the co-expression of the two drivers has not been examined, and Ir94a-GAL4 does not appear to be expressed in the LSO. Given the challenges in validating expression patterns in pharyngeal organs, the possibility that the drivers do not entirely capture endogenous expression cannot be ruled out. Rescue experiments using feeding preference or single-cell imaging don't suffice as validation. Plus, the expression of Ir51b could not be defined.

      Based on current literature, Ir94a and Ir94h exhibit distinct expression patterns localized to different sensory regions. Specifically, Ir94a is primarily expressed in the V5 region of the VCSO, where it co-localizes with Ir94c-GAL4 (Chen et al., 2017). Conversely, Ir94h is found in the L7-7 sensilla of the LSO, where it co-expresses with Ir94f, and also within the V2 cells of the VCSO. Notably, the projections of Ir94a and Ir94h into the dorso-anterior subesophageal ganglion suggest divergent expression patterns rather than co-expression in the pharyngeal regions (Koh et al., 2014). Regarding co-expression of Ir94a and Ir94h in the LSO, we did not find any evidence to support this claim. Our data reinforce this view, showing that Ir94a-GAL4 expression is limited to the VCSO, while Ir94h-GAL4 is present in both the LSO and VCSO. Thus, the notion of co-expression of Ir94a and Ir94h in the LSO is not substantiated by current evidence.

      As a reviewer suggested, it is possible that the GAL4 drivers utilized may not fully reflect the endogenous expression of these receptors. Despite this limitation, our behavioral, expression, and physiological analyses strongly suggest that Ir94a and Ir94h are located in distinct regions, supporting a model of two parallel IR-mediated pathways operating within the sensory system.

      In addition, RT-PCR analysis confirmed the presence of Ir51b. However, due to methodological constraints, we were unable to conduct cell-type-specific expression studies using Ir51b-GAL4. This limitation, which we have acknowledged in the manuscript, does not detract from our core findings but highlights an area for future research. Further studies utilizing cell-specific expression analysis and co-expression studies with additional drivers could offer more definitive insights into IR51b’s functional role and its interactions within broader IR-mediated pathways.

      (2) The description of methods and results for the ex vivo calcium imaging is not satisfactory. Details about which cells are being analyzed, and in which organs are not included. No solvent stimulus is tested. The temporal dynamics of the responses are not presented. Movies of the imaging are not included as supplementary information - it would be important to visualize those with what was considered modest movement.

      We appreciate this valuable feedback. As discussed above, Ir94h is specifically expressed in the L7-7 sensilla of the LSO, while Ir94a is expressed in the V2 cells of the VCSO. This evidence led us to focus specifically on these cells in our calcium imaging study to ensure accuracy and relevance. In our experiments, Adult hemolymph solution (AHL) (108 mM NaCl, 5 mM KCl, 8.2 mM MgCl2, 2 mM CaCl2, 4 mM NaHCO3, 1 mM NaH2PO4, 5 mM HEPES, pH 7.5) was used as the solvent and employed as a pre-stimulus (as mentioned in the Methods section). During this phase, we observed no changes in fluorescence, indicating that AHL itself did not influence the responses. Fluorescence changes occurred only when the test chemical, dissolved in AHL, was introduced. To further confirm that AHL had no impact on the results, we conducted continuous recordings with AHL alone before beginning our main experiments, and these trials confirmed the absence of fluorescence alterations. We have included the temporal dynamics and supplementary video recordings to provide a more comprehensive understanding of our findings.

      (3) The observed differences in phenotypes of Ir25a and Ir76b mutants are intriguing, as are those between the co-receptor mutants and Ir51b, Ir94a, and Ir94h, but have not been sufficiently considered. Prior studies have also found roles for other response modes (OFF response), other IRs and GRs, and other organs (labellum, tarsi) in behavioral responses to carboxylic acids. Overall, the authors' model may be overly simplistic, and the discussion does not do justice to how their model reconciles with the body of work that already exists.

      Stanley et al. (2021) reported that the gustatory detection of lactic acid requires both IRs and GRs functioning together. Specifically, they found that IR25a mediates the onset peak response (ON response) to lactic acid, while GRs dampen this response and contribute to a removal peak (OFF response). Interestingly, in Ir25a mutants, a small onset peak still occurred, while Gr64a-f mutants showed an enhanced onset, suggesting that IRs and GRs interact dynamically to modulate taste responses.

      In our previous work, we also observed the role of sweet GRs, in addition to Ir25a and Ir76b, in detecting carboxylic acids in the labellum (Shrestha et al., 2021). This raises the possibility of a similar interplay with carboxylic acids in our current study, where different IRs may contribute to distinct aspects of sensory responses in the pharynx, leading to the phenotypic differences we observed. Moreover, Chen et al. (2017) demonstrated that sour-sensing neurons in the tarsi express both IR76b and IR25a and specifically respond to carboxylic and inorganic acids without reacting to sweet or bitter compounds. This finding points to a specialized role for these receptors in sour detection and suggests a coordinated response involving multiple sensory organs—such as the labellum, tarsi, and pharynx.

      The phenotypic differences observed in our mutants align with a more integrated model of carboxylic acid detection, in which multiple receptors and sensory organs contribute to the overall behavioral response. This supports the idea that our current model offers a more detailed understanding of how different carboxylic acids are detected and processed by the gustatory system.

      Reviewer #2 (Public review):

      Shrestha et al investigated the role of IR receptors in the detection of 3 carboxylic acids in adult Drosophila. A low concentration of either of these carboxylic acids added to 2 mM sucrose (1% lactic acid (LA), citric acid (CA), or glycolic acid (GA)) stimulates the consumption of adult flies in choice conditions. The authors use this behavioral test to screen the impact of mutations within 33 receptors belonging to the IR family, a large family of receptors derived from glutamate receptors and expressed both in the olfactory and gustatory sensilla of insects. Within the panel of mutants tested, they observed that 3 receptors (IR25a, IR51b, and IR76b) impaired the detection of LA, CA, and GA, and that 2 others impacted the detection of CA and GA (IR94a and IR94h). Interestingly, impairing IR51b, IR94a, and IR94h did not affect the electrophysiological responses of external gustatory sensilla to LA, CA, and GA. Thanks to the use of GAL4 strains associated with these receptors and thanks to the use of poxn mutants (which do not develop external gustatory sensilla but still have functional internal receptors), they show evidence that IR94a and IR94h are only expressed in two clusters of gustatory neurons of the pharynx, respectively in the VCSO (ventral cibarial sense organ) and in the VCSO + LSO (labral sense organ). As for IR51b, the GAL4 approach was not successful but RT-PCR made on different parts of the insect showed an expression both in the pharyngeal organs and in peripheral receptors. These main findings are then complemented by a host of additional experiments meant to better understand the respective roles of IR94a and IR94h, by using optogenetics and brain calcium imaging using GCamp6. They also report a failed attempt to co-express IR51b, IR94a, and IR94h into external receptors, a co-expression which did not confer the capability of bitter-sensitive cells (expressing GR33a-GAL4) to detect either of the carboxylic acids. These data complete and expand previous observations made on this group and others, and dot to 2 new IR receptors which show an unsuspected specific expression, into organs that still remain difficult to study.

      The conclusions of this paper are supported by the data presented, but it remains difficult to make general conclusions as concerns the mechanisms by which carboxylic acids are detected.

      (1) All experiments were done with 1% of carboxylic acids. What is the dose dependency of the behavioral responses to these acids, and is it conceivable that other receptors are involved at other concentrations?

      In our study, we conducted experiments to examine the dose dependency of behavioral responses to carboxylic acids, with results presented in Supplementary Figure 1. We found that lower concentrations of carboxylic acids are perceived as attractive, while higher concentrations are aversive. This differential response suggests that the receptors identified in our study are primarily tuned to detect low concentrations of these acids. Since higher concentrations elicited aversive responses, it is plausible that additional receptors, beyond the scope of our study, may be involved in sensing these higher concentrations. These receptors could be part of other gustatory receptor neurons that respond specifically to increased acid levels, as fruit flies tend to avoid higher concentrations. We propose that future research could investigate these alternative pathways to gain a complete understanding of the behavioral responses to carboxylic acids. In summary, our findings suggest that specific receptors are involved in detecting low concentrations, while distinct receptor pathways—possibly mediated by other GRNs—may regulate responses to higher concentrations.

      (2) One result needs to be better discussed and hypotheses proposed - which is why the mutations of most receptors lead to a loss of detection (mutant flies become incapable of detecting the acid) while mutations in IR94a and IR94h make CA and GA potent deterrents. Does it mean that CA and GA are detected by another set of receptors that, when activated, make flies actively avoid CA and GA? In that case, do the authors think that testing receptors one by one is enough to uncover all the receptors participating in the detection of these substances?

      As we mentioned above, it is possible that distinct receptor pathways mediate avoidance of GA and CA. This suggests that CA and GA might activate different sets of receptors that trigger avoidance behavior, pointing to a more complex interplay of receptor activity than we initially considered. Certain acids may indeed be detected by multiple receptors, with each receptor contributing uniquely to the behavioral response. Regarding the sufficiency of testing receptors individually, we recognize the limitations of this approach. Examining receptors one by one may not reveal the full spectrum of receptors involved, especially due to potential interactions or compensatory mechanisms that only emerge when certain receptors are inactive. Therefore, a more holistic approach—such as genetic screens for behavioral responses or using complex genetic models to disrupt multiple receptors simultaneously—could provide deeper insights. Moving forward, incorporating receptor interactions that modulate each other, along with more comprehensive assays, could help explain these discrepancies by uncovering previously overlooked receptor functions.

      (3) The paper needs to be updated with a recent paper published by Guillemin et al (2024), indicating that LA is detected externally by a combination of IR94e, IR76b and IR25a. IR25a might help to form a fully functional receptor in GR33a neurons (a former study from Chen et al (2017) indicate that IR25a is expressed in all gustatory neurons of the pharynx).

      According to Guillemin et al. (2024), the combination of IR94e, IR76b, and IR25a is required for amino acid detection but not for detecting lactic acid (LA). In their calcium imaging experiments, 100 mM LA elicited a response similar to the vehicle control, suggesting that these receptors do not play a role in LA detection.

      (4) Although it was not the main focus of the paper, it would have been most interesting if the cells expressing IR94a and IR94h were identified, and placed on the functional map proposed by the group of Dahanukar (Chen et al 2017 Cell Reports, Chen et al 2019 Cell Reports).

      The expression patterns of IR94a and IR94h were previously detailed by Chen et al. (2017), showing that IR94h is expressed in the labial sense organ (LSO, specifically in L7-7) and the ventral cibarial sense organ (VCSO, V2), while IR94a is expressed in the VCSO (V5). Given this established information, we referenced these known expression patterns without replicating the mapping in our study. Our primary focus was to investigate the functional role of these neurons within the pharynx, and we believe we have successfully highlighted their specific contributions. However, we recognize that integrating the functional mapping of these neurons in alignment with the work of Dahanukar’s group would have strengthened our findings and provided a more comprehensive understanding. We acknowledge this as a limitation of our study and appreciate your suggestion, as it points to a valuable direction for future research.

      Reviewer #3 (Public review):

      Summary:

      In this work, the authors investigated the molecular and cellular basis of sour taste perception in Drosophila melanogaster, focusing on identifying receptors that mediate attractive responses to certain carboxylic acids. It builds on previous work from the same group that had identified the IR co-receptors IR25a and IR76b for this sensory process, screening a set of mutants in IRs to identify three, IR51b, IR94a, and IR94h, required for feeding preference responses to some or all of the tested acids.

      Strengths:

      The work is of interest because it assigns sensory roles to IRs of previously unknown function, in particular IR94a and IR94h, and points to pharyngeal neurons in which these receptors are expressed as the relevant sensory neurons (potentially with different roles for IR94a- and IR94h-expressing neurons). The work combines elegant genetics, simple but effective feeding and taste assays, chemo-/opto-genetic activation, and some calcium imaging. Overall the presented data look solid and well-controlled.

      Weaknesses:

      The in situ expression analysis relies entirely on transgenic driver lines for IR94a and IR94h (which had been previously described, though not fully cited in this work). Importantly, given that many of the behavioral experiments (genetic rescue, physiology, artificial activation) use the IR94a and IR94h GAL4 driver lines, it would be helpful to validate that these faithfully reflect IR94a and IR94h expression (as far as I can tell, such validation wasn't done in the original papers describing these lines as part of a large collection of IR drivers). For IR51b, pharyngeal expression is concluded indirectly from non-quantitative RT-PCR analysis (genetic reporters did not work). The lack of direct detection of gene/protein expression (for example, through RNA FISH, immunofluorescence, or protein tagging) would have made for a more complete characterization of these receptors (for example, there is no direct evidence that they also express IR25a and IR76b, as one might expect). Finally, the relationship of IR94a and IR94h neurons to other types of pharyngeal neurons remains unclear, as are their projection patterns in the SEZ.

      Conceptually, the work is of interest mostly to those in the immediate field; there have been a very large number of studies in the past decade (several from this lab) characterizing the contributions of different IRs to various chemosensory processes. The current work doesn't lend much insight into the nature of the minimal functional unit of gustatory IRs (reconstitution of a functional IR in a heterologous neuron/cell has not been achieved here, but this is a limitation of many other previous studies), nor to how different pharyngeal sensory pathways might collaborate to control behavior. Nevertheless, the findings provide a useful contribution to the literature.

      We appreciate your thoughtful feedback. As noted in our response, our primary objective was to investigate the sensory functions of IR94a and IR94h. To this end, we conducted behavioral assays, which we validated with additional approaches including genetic rescue, physiological tests, and artificial activation. Throughout these experiments, we extensively utilized Ir94a- and Ir94h-GAL4 driver lines. To ensure these lines accurately reflect the expression of IR94a and IR94h, we verified their expression patterns using immunohistochemistry across various body parts. Our results align with previous findings that show both receptors are exclusively expressed in the pharynx. Regarding IR51b, we employed RT-PCR due to its high sensitivity and specificity, which supported our hypothesis. Nonetheless, we agree that more direct detection methods would have provided a stronger validation of IR51b expression. Our previous study (Sang et al., 2024) also demonstrated the pharyngeal expression of co-expressed receptors, specifically IR25a and IR76b. However, we recognize that the lack of direct evidence for their co-expression with IR51b remains a significant gap. This limitation primarily stems from the unavailability of specific reagents needed for direct assays targeting IR51b, which restricted our experimental approach.

      You also raised the potential relationship between IR94a and IR94h neurons and other pharyngeal neuron types, including their projection patterns in the subesophageal zone. This is indeed an important area for future research that could clarify neural connectivity and further our understanding of sensory mechanisms. However, our study was focused on exploring sensory mechanisms in peripheral regions rather than detailed neural mapping in the SEZ. Investigating these connections would undoubtedly provide valuable insights into the neural circuitry involved and represents an intriguing direction for future research.

    1. Reviewer #2 (Public review):

      Summary:

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study which demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      Review of revised manuscript:

      The authors have addressed my point-by-point comments to my satisfaction. In the cases where they have changed their manuscript language, clarified figures, or added analyses I have no further comment. In some cases, there is a fruitful back-and-forth discussion of methodology which I think will be of interest to readers.

      I have nothing to add during this round of review. I think that the paper and associated discussion will make a nice contribution to the field

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root:shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and the development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Strengths:

      The holistic approach and integrative methodologies presented in the manuscript are essential for gaining a mechanistic understanding of a complex trait such as salt tolerance. The authors focused on At3g50160 but included in their analyses additional DUF247 paralogs, which further contributes to the strength of their approach. In addition, the authors considered the developmental stage (young seedlings, early or late vegetative stages) and growth conditions of the plants (agar plates or soil) when investigating the role of SR3G in salt tolerance and root or shoot development.

      Weaknesses:

      The authors' claims and interpretation of the results are not fully supported by the data and analyses. In several cases, the authors report differences that are not statistically significant (e.g., Figures 4A, 7C, 8B, S14, S16B, S17C), use inappropriate statistical tests (e.g., t-test instead of Dunnett Test/ANOVA as in Figures 10B-C, S19-23), present standard errors that do not seem to be consistent with the post-hoc Tukey HSD Test (e.g., Figures 4, 9B-C, S16B), or lack controls (e.g., Figure 5C-E, staining of the truncated versions with FM4-64 is missing).

      We thank the reviewer for their critical thoughts on the presented data. We have revised our data interpretation in the main text to more accurately reflect the results. Given the nature of our experimental setup, where we trace the roots of individual Arabidopsis seedlings grown on plates, there is considerable biological variation, which makes achieving strong statistical significance between samples or genotypes challenging. However, we think that the representation of the data as transparently as possible is necessary to provide the readers and reviewers a true picture of the variability that we are observing.  Consequently, we have centered our data interpretation around observable trends that facilitate drawing conclusions.

      The choice of statistical test is closely tied to the specific biological question being addressed. In Figures 10A-C, as in Figures 6A-B, we compared all genotypes to the wild-type Col-0 within each condition, and thus ANOVA analysis, testing the general effect of the genotype across both mutants and Col-0 wild-type is not appropriate. Similarly, in Figures S19-S23, we compared each mutant line to the wild-type Col-0 under each condition.

      We repeated the post-hoc Tukey HSD Test for Figures 4, 9B-C, and S16B and made adjustments where necessary (see tracked changes manuscript).

      The truncated versions do not localize to the plasma membrane; instead, they are targeted to the nucleus and cytosol, mimicking the localization pattern of free GFP, which was used as a control in Panel F. Therefore, we believe that having FM4-64 as a control for these specific images is not informative, but instead using free GFP is serving as a better control in that particular construct.

      In other cases, traits of root system architecture and expression patterns are inconsistent between different assays despite similar growth conditions (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B), or T-DNA insertion alleles of WRKY75 that are claimed to be loss-of-function show comparable expression of WRKY75 as WT plants. Additionally, several supplemental figures are mislabeled (Figures S6-9), and some figure panels are missing (e.g., Figures S16C and S17E).

      We thank the reviewer for raising these points and noticing the inconsistency between different assays (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B). As mentioned above, considerable biological variation makes achieving strong statistical significance between samples, genotypes, or experiments challenging. Thus, we have centered our data interpretation around observable “trends” between experiments to facilitate drawing conclusions. Considering Figures S17A-B, 10A-C, and 6A, we acknowledge the reviewer's concern about inconsistencies in root system architecture across experiments. Initially, we observed that the sr3g mutant had reduced lateral root length compared to Col-0 under salt stress. This led us to focus on this specific phenotypic trait rather than the overall root system architecture. Despite some variation, the sr3g mutant consistently showed a similar trend/phenotype when compared to Col-0 under salt stress. We believe the variation in main root length and lateral root number between experiments is due to inherent differences between biological replicates.

      Regarding gene expression patterns between Figures S16B and 4A/9B, we included part of Figure 9B (SR3G gene expression in Col-0) in Figure 4A. Figure S16B represents a completely different assay. Despite variations between assays, the overall message remains consistent: SR3G gene expression is induced under salt stress in the root but not in the shoot.

      Both SR3G and WRKY75 are expressed at very low levels, even under the 75 mM salt stress condition we tested. When gene expression is so low, detecting changes is challenging due to inherent variations. Nonetheless, we observed a reduction in WRKY75 expression in the mutant lines compared to wild-type Col-0, though this reduction was not statistically significant. More importantly, we observed a similar phenotype in the wrky75 mutant, specifically reduced main root length under salt stress, consistent with the findings of the published paper in The Plant Cell by Lu et al. (2023) “Lu, K.K., Song, R.F., Guo, J.X., Zhang, Y., Zuo, J.X., Chen, H.H., Liao, C.Y., Hu, X.Y., Ren, F., Lu, Y.T. and Liu, W.C., 2023. CycC1; 1–WRKY75 complex-mediated transcriptional regulation of SOS1 controls salt stress tolerance in Arabidopsis. The Plant Cell, 35(7), pp.2570-2591”.

      We appreciate the reviewer for spotting the missing labels for Figures S6-9. We corrected them at the main text, figures, and legends. We added panel C to Figure S16 and removed panel E from Figure S17 legend,  now they match to actual figures and legends.

      Consequently, the authors' decisions regarding subsequent functional assays, as well as major conclusions about gene function, including SR3G function in root system architecture, involvement in root suberization, and regulation of cellular damage are incomplete.

      We greatly appreciate the reviewer's thorough review of our manuscript and their critical comments. We have carefully addressed all comments and concerns.

      Reviewer #2 (Public Review):

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity, and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study that demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      The abstract and beginning of the Discussion section highlight the "new tool" developed here for measuring biomass accumulation. I feel that this distracts from the central aims of the study, which is really about the role of a specific gene in root development under salt stress. I would suggest moving the tool description to less prominent parts of the manuscript.

      We appreciate the reviewer's suggestion. We believe that the innovative tool used to extract shoot-to-root ratio data from previous experiments underscores the value of reutilizing previously acquired data for new discoveries and demonstrates how reanalyzing the same data can provide fresh insights, such as identification of new allelic variation. Therefore, we decided to retain this section, as our discovery of the SR3G gene originated from this innovative tool.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Line 58 (opening sentence) - salt accumulation in the soil is not caused by evaporation exceeding input; that scenario results in soil water deficit. The issue is when the input water has dissolved ions.

      We thank the reviewer for raising this important point. While this point is theoretically true, all of the water that is found in natural environments contains some dissolved ions. Therefore, drought conditions will lead, over time, to increased soil salinization. We have amended this sentence to represent our point better.

      “Salt stress is predominant in the dryland areas where evaporation rate exceeds water input. As all water contains dissolved ions, the prolonged exposure to drought stress results in increased accumulation of salts in the upper soil layers 1–3.”

      I feel that it would be helpful, for replication and for interpretation, if the authors could provide water potentials for the growing media used throughout. What water potentials are the plants experiencing when grown in 1/2 MS + agar at 0, 75, and 150mM NaCl? Juenger and Verslues present a great recent discussion of the importance of reporting these values (Juenger, T. E. and P. E. Verslues (2023). "Time for a drought experiment: Do you know your plants' water status?" Plant Cell 35(1): 10-23.)

      Critically, how do the water potentials experienced by agar-grown plants compare to those experienced in soil-grown plants? As a stated aim of this study is to allow translation to crops these data are very important to convince physiologists of the relevance of the results.

      We thank the reviewer for raising this important point. We completely agree that growing plants on agar plates is an artificial setup and knowing the water potential of the plants within this setup would be highly informative. However, as indicated in review by Juenger and Verslues 2023, the agar plate setup is much more reproducible compared to various soil conditions, and we report the media composition in sufficient detail for it to be reproduced in other laboratory conditions.

      Furthermore, while investigating the water status of plants and soil is indeed intriguing, it is beyond the scope of this study and would require us to redo the experiments with specific tools listed within the Juennger and Verslues review, which are currently not within our laboratory equipment list.

      Importantly, any changes reported in this manuscript apply equally to both wild-type and mutant lines under all conditions. We provide extensive report on the soil type used, as well as soil quantity. We are using the gravimetric method to determine the water content, and salt stress application, as described in previous works from our lab (Yu and Sussman et al., 2024 Plant Physiology and Awlia et al., 2016 Frontiers in Plant Science). 

      Nonetheless, we have now included water content measurements for soil-grown plants under different conditions, calculated by subtracting dry weight from fresh weight (new Fig. S24). Although plant water content may not fully capture the water status of the media or soil, our measurements did not reveal any significant differences in water content between genotypes across the various conditions tested.

      Line 69- missing an "and" after "(ABA)."

      Thanks. We added the missing “and”.

      Line 79 - I think the association being made is between natural variation in root and shoot growth and genetic variants, not "underlying genes."

      We thank the reviewer for this suggestion. The cause for the identified association indeed relies on allelic variation within the genetic region. We have re-phrased this sentence within the manuscript.

      “Many forward genetic studies were highly successful in associating natural variation in root and shoot growth with allelic variation in gene coding and promoter regions, thereby identifying potential new target traits for improved stress resilience 18,20,21.”

      Figure 1 - what do "seGF" and "reGF" stand for? Shoot and root growth rate, respectively, but there are extra letters in there…

      The abbreviations stand for shoot exponential Growth Factor and root exponential Growth factor. An explanation of the acronym has been added to the text.

      “The increase in the projected area of shoot and root (Fig. S2) was used to estimate (A) shoot and (B) root exponential growth rate (seGR and reGR respectively).”

      Figure 1 legend - there's an "s" missing in "across." And two "additionally" in the penultimate sentence.

      Thanks for spotting the errors. We fixed these errors.

      Line 109 - how was the white balance estimated for the images on the flatbed scanner?

      Within the developed tool, we have not adjusted or controlled for white balance in any way, as the white balance from the flatbed scanner is kept at one value. The tool transforms the imaged pixels into bins consisting of white (root), green (shoot), and blue (place) pixels based on the closest distance in the RGB scale to the particular color, which makes correcting for white balance obsolete. We have provided an additional explanation for this within the M&M section.

      “A Matlab-based tool was developed to simplify and speed up the segmentation and analysis pipeline. For automatic segmentation, the tool uses a combination of image operations (histogram equalization), thresholding on different color spaces (e.g., RGB, YCbCr, Lab, HSV), and binary image processing (boundary and islands removal). As the tool is digitalizing various color scales and classifies pixels into either white (root), green (shoot) or blue (background) categories, the adjustment for white balance is obsolete. ”

      GWAS was performed separately on traits measured at control, 75mM, and 150mM NaCl treatments. Would it also be informative to map the STI measurement (i.e. plasticity) introduced here?

      We thank the reviewer for this important point. We have performed GWAS on both “raw” and STI traits, however, we found that the identified associations were not as abundant as the ones identified with “raw traits”. This makes sense, as we are compounding the root or shoot growth under both conditions, and plastic responses to the environment are expected to be genetically more complex, as they involve more genetic regulators compared to phenotypes that have low plasticity. We have added this as a part of the result description, as we acknowledge that this might be an interesting observation for the field to build upon, and might provide fodder for new methods to deconvolute the complexity in mapping the plastic traits. 

      “To identify genetic components underlying salt-induced changes in root:shoot ratio, we used the collected data as an input for GWAS. The associations were evaluated based on the p-value, the number of SNPs within the locus, and the number of traits associated with individual loci. As Bonferroni threshold differs depending on the minor allele count (MAC) considered, we identified significant associations based on a Bonferroni threshold for each subpopulation of SNPs based on MAC (Table S3). While we conducted a GWAS on directly measured traits, as well as their Salt Tolerance Index (STI) values, however the amount of associations with STI was much lower compared to directly measured traits (Table S3). This observation aligns with the understanding that plastic responses to environmental conditions tend to be genetically more complex. This complexity likely stems from the involvement of more genetic regulators compared to low-plasticity phenotypes.”

      Line 167 - how was LD incorporated into this analysis? Did you use a genome average? Or was LD allowed to vary (as it does) across the genome?

      Initially, we have used genome average LD for this purpose (10 kbp for Arabidopsis), and extended the region of interest based on the number of coding genes within the window. We have added this as a part of description to our manuscript.

      “For the most promising candidate loci (Table S4), we have identified the gene open reading frames that were located within the genome-wide linkage-disequilibrium (LD) of the associated SNPs. The LD was expanded if multiple SNPs were identified within the region, and the region of interest was expanded based on the number of coding genes within the LD window. ”

      Line 291 - I think the water potentials are essential, here. What does 50% of soil water holding capacity equal in these soils? In the substrate that we use in our lab, that would represent a considerable soil water deficit even without any salts in the soil.

      We thank the reviewer for this comment. As Arabidopsis is occurring naturally in low soil water holding capacity soils (i.e. sandy soils), it is typically growing better in soils that are not very saturated with the water. Throughout many experiments, performed within this study, and other studies performed in our lab (results reported in Awlia et al., 2016 Frontiers in Plant Science and Yu & Sussman et al., 2024 Plant Physiology), we have not observed any drought like symptoms at 50% soil water holding capacity. The fact that this is reproducible across similar soil types across two laboratories (one in Saudi Arabia and one in the USA) is not to be dismissed. Again - we are currently not equipped to measure water potentials for these plants, as this is not a standard practice (yet) for stress experiments, but we are taking these comments on board for all of our future experiments.

      Moreover, our control plants are also “dried down” to 50% of SWHC, and soaked in non-saline water during the “salt stress treatment” to make sure that the soil water saturation is accounted for within the experimental setup. This “dry down” of soil is necessary to ensure equal and effective salt penetration into the soil particles. More details on this method can be found in Awlia et al., 2016.

      Again - We have added a new dataset measuring water content in individually soil-grown plants under different conditions as a proxy for soil water status (see new Fig. S24). While we did not observe any significant differences in water content between genotypes under the various conditions, the sr3g mutant showed a slightly higher, though non-significant, water content compared to wild-type Col-0 under control conditions.

      We have provided additional information and comments to warn the readers about this method:

      “The seeds were germinated in ½ MS media for one week, as described for the agar-based plate experiments. One week after germination, the seedlings were transplanted to the pot (12 x 4 cm insert) containing the Cornell Mix soil (per batch combine: 0.16 m3 of peat moss, 20.84 kg of vermiculite, 0.59 kg of Uni-Mix fertilizer, and 2.27 kg of lime) watered to 100% water holding capacity and placed in the walk-in growth chamber with the 16 h light / 8 h dark period, 22°C and 60% relative humidity throughout the growth period. When all of the pots dried down to the weight corresponding to 50% of their water holding capacity, they were soaked for 1 h in tap water or a 200 mM NaCl solution, resulting in an effective concentration of 100 mM NaCl based on the 50% soil water holding capacity, which corresponded to a moderate level of salt stress (Awlia et al., 2016). The control pots were soaked for the same length of time in 0 mM NaCl solution, to account for the soil saturation effect. We then allowed the pots to be drained for 2-3 h to eliminate excess moisture. The pots were placed under phenotyping rigs equipped with an automated imaging system (Yu et al., 2023) and the pot weight was measured daily to maintain the reference weight corresponding to 50% of the soil water holding capacity throughout the experiment. We would like to note that this gravimetric based method for application of salt stress has been developed for soils typically used for pot-grown plants, with relatively high water holding capacity (Awlia et al. 2016). Within these specific conditions, no drought stress symptoms were observed.”

      Lines 415-416 - are these contrasts significant? Figure S3 likewise does not have any notation for significant differences in the means.

      We have previously not tested the stronger effect of 125 mM vs 75 mM on relative root and shoot growth, and thus these test results were initially not included in Fig. S3. We have now added the tests and included them within Fig. S3, and added description of their significance into the main body of the manuscript:

      “In comparison, the growth rates of the shoot were significantly reduced to 0.71 and 0.43 of the control in 75 and 125 mM NaCl treatments, respectively (Fig. S3). While the mean value of root:shoot growth rate did not change upon salt stress treatment, the variance in the root:shoot ratio significantly expanded with the increasing concentrations of salt (Fig. 1C). These results suggest that while root and shoot growth are well coordinated under non-stress conditions, salt stress exposure results in loss of coordination of organ growth across Arabidopsis accessions.”

      Line 418 - same comment as preceding. Is this change in variance significant?

      We have previously not tested this. We have now added the ANOVA tests and included them within each figure, and added description of their significance into the main body of the manuscript. (see text above)

      Line 421 - why would we expect there to be a correlation between root:shoot growth ratio and seedling size?

      We were trying to use the seedling size as a proxy for “fitness” - or how well the plants can survive under these specific conditions. We were testing here whether any simple and directional strategy - such as increase or decrease in root:shoot ratio under salt stress - is resulting in better salt tolerance - which would translate into larger overall seedlings. We have rephrased this within the manuscript, to better explain the hypothesis being tested within this specific figure:

      “To test whether there is a clear directional correlation between the change in root:shoot ratio and overall salt stress tolerance, we have used the overall seedling size as a proxy for plant salt tolerance (Fig. S4, S5). No significant correlation was found between the root:shoot growth ratio and total seedling size (Fig. S4, S5), indicating that the relationship between coordination of root and shoot growth and salt tolerance during the early seedling establishment is complex.”

      Line 438 - I think a stable web link would be more appropriate than listing Dr. Nordborg's email address.

      Sorry about this. There is a glitch with our reference citing software. We agree, and thank the reviewer for noticing this! We assigned reference number 43 to it.

      Line 439 - I expect that many of your readers may not be experienced with GWAS. Can you provide an explanation as to why only one locus was detected with both the 250K SNP panel and the 4M SNP panel?

      We thank the reviewer for raising this point. We have added additional explanation to this observation:

      “Increased SNP density can provide more potential associations, highlighting the associated loci with more confidence, due to more SNPs being detected within specific region. The different panels could capture different LD blocks across the genome. If the locus detected by both panels is in a region of strong LD or under selection, it could be detected consistently. In contrast, other loci may not be captured well by the lower-density 250K SNP panel. The new GWAS revealed 32 additional loci, with only one significantly associated locus being picked up by both 250k and 4M SNPs GWAS (locus 30, Table S3). The detection of only one common locus between the two SNP panels is likely due to differences in resolution, statistical power, and how well each panel captures the genomic regions associated with the trait. ”

      Figure 2A and B - I suggest adding the p-value cutoff to the y-axis of the Manhattan Plots

      We thank the reviewer for this suggestion, however this is not appropriate. The genome wide p-value cutoffs for GWAS studies are arbitrary, and we have not used a genome-wide cutoff for our SNPs, but rather used cutoffs depending on the minor allele frequency. Therefore, we think adding a straight line to the graphs in Fig. 2A-B representing the overall cutoff, would be misleading. Please see below the text where we explain how the threshold was calculated for individual groups of SNPs with varying MAF:

      “The GWAS associations were evaluated for minor allele count (MAC) and association strength above the Bonferroni threshold with -log10(p-value/#SNPs), calculated for each sub-population of SNPs above threshold MAC (Table S3, Bonf.threshold.MAC.specific)”

      Line 490-492 - Presents the results of the gene tree to support a model in which SR3G diverged from AT3G50150 prior to the speciation events leading to Capsella and Arabidopsis. But this topology requires at least two independent losses of SR3G - can you rule out the hypothesis that the position of SR3G on the gene tree is a result of long branch attraction? Given the syntenic orientation of AT3G50150 and SR3G, and apparent directional selection experienced by the latter lineage, it seems more parsimonious that AT3G50150 and SR3G arose from a very recent duplication event.

      We agree with the reviewer that it seemed most parsimonious for AT3G50160 (SR3G) to be a recent tandem duplication of AT3G50150 – and this was certainly our expectation given the other tandem duplications that have occurred in this genomic region. However, irrespective of the type of alignment from which we built the phylogeny (nucleotide vs AA; sometimes nucleotide is noisier but provides more information) we were never able to recapitulate a tree where AT3G50160 was immediately sister to AT3G50150 – even with a long branch for AT3G50160 indicating a rapid pace of nucleotide/AA change relative to AT3G50150. In regards to long branch attraction, it is our interpretation that long branch attraction typically requires multiple long branches that get placed together at a poorly supported node where sampling is sparse (https://www.nature.com/articles/s41576-020-0233-0), whereas we have the single long branch for AT3G50160, and all other A/C clade (Arabidopsis/Camelina/Capsella) members forming a lineage with a much shorter branch. To test the possibility of long branch attraction we subtracted out individual members of the AT3G50150/160 clade to see if there was algorithmic uncertainty in the placement of AT3G50160. We did not observe this in any of the branch subtractions that we performed (see below). Thus, it appears that we must stick with our original interpretation. If the reviewer would like us to soften this interpretation, we would be more than happy to do so, as it does not impact the overall conclusions for AT3G50160 being a rapidly evolving member of this clade.

      Author response image 1.

      Line 494 (and throughout) - I expect that all of the genes being studied herein are "experiencing selection," even if it's boring-old purifying selection on functionally conserved proteins. I think you mean to say "directional selection."

      We thank the reviewer for this comment and completely agree that we lacked precision on our statement. We have corrected this throughout the manuscript.

      Line 497 - state the background and foreground values of omega, here.

      We apologize for not including these values and have added them at this point in the manuscript (new Table S6).

      Line 511 and Line 673 - Inspection of Figure S13B suggests that SR3G is not "predominantly" expressed nor does it have the "highest enrichment" in the root stele. Certainly, among root cell types, this is predominant. But it appears to be quite highly expressed in late-stage seeds and some floral organs, as well.

      We appreciate the reviewer for recognizing that SR3G is not a highly expressed gene. In root cell types, its expression is enriched in the root stele. Overall, SR3G is expressed at both early and later developmental stages. Our investigation of later developmental stages related to seed production did not reveal any significant phenotypic differences in fertility.

      Line 514 - "54-folds" should be "54-fold."

      Thanks. We made corrections.

      Figure 7 - For symmetry, I suggest adding the "Beginning of salt stress" arrow to the "Early Stress" panel as well (even if it's right at day 0).

      Thanks. We added the arrow to Early Stress in both Panels A and B.

      Figure S2 - both graphs should have the same scale on the y-axis

      Thanks - we have now re-plotted the graph with the matching y-axis scales.

      Line 531 - I feel that this is a significant overstatement. The strongest statement supported by the results presented here is that SR3G is the most prominent DUF247 studied herein in root development under salt stress.

      Thanks for the comments. We rephrase the statement.

      “These results suggest that SR3G is the most prominent DUF247 studied within our study to affect root development under salt stress.”

      Lines 583-605 - These data seem to me to be tangential to the central aims of the study. I suggest removing them for clarity/brevity.

      We greatly appreciate the reviewer's suggestion. Our study primarily focused on characterizing the main GWAS candidate, SR3G. Since SR3G is located within a cluster of other DUF247 genes on chromosome 3, we believe that screening the neighboring DUF247 genes could provide further insights into SR3G’s role in root development. Additionally, we believe that the generated data and lines will serve as a valuable resource for other researchers interested in studying these genes. For these reasons, we have decided to retain these datasets in the manuscript.

      Lines 650-652 - these sections 1-3 differences in suberization between SR3G and Col-0 under control conditions are not significant. At best, this may be described as a "trend" and not "higher levels." In section 4, it is VERY marginally significant (and probably not at all after the large number of tests performed, here.)

      We appreciate the reviewer's feedback and have revised the wording accordingly.

      Line 660 - this statement is only true for Section 1. I suggest adding this caveat.

      We appreciate the reviewer's comments on this matter. We quantified four suberin monomers in whole root seedlings rather than in individual root sections due to the technical challenges of separating the sections without microscopy and the limited availability of samples for GS-MS analysis.

    3. eLife Assessment

      Through cellular, developmental, and physiological analysis, this valuable study identifies a gene that regulates the relative growth of roots and shoots under salt stress. The holistic approach taken provides solid evidence that this member of a larger tandemly duplicated gene family together with an upstream regulator contributes to salt tolerance, although the statistical or biological support for some conclusions could be more robust. The manuscript will be of interest to plant biologists studying mechanisms of abiotic stress tolerance and gene family evolution.

    4. Reviewer #1 (Public review):

      Summary:

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root:shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and the development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Comments on revisions:

      As the authors correctly noted, variations across samples, genotypes, or experiments make achieving statistical significance challenging. Should the authors choose to emphasize trends across experiments to draw biological conclusions, careful revisions of the text, including titles and figure legends, will be necessary to address some of the inconsistencies between figures (see examples below). However, I would caution that this approach may dilute the overall impact of the work on SR3G function and regulation. Therefore, I strongly recommend pursuing additional experimental evidence wherever possible to strengthen the conclusions.

      (1) Given the phenotypic differences shown in Figures S17A-B, 10A-C, and 6A, the statement that "SR3G does not play a role in plant development under non-stress conditions" (lines 680-681) requires revision to better reflect the observed data.<br /> (2) I agree with the authors that detecting expression differences in lowly expressed genes can be challenging. However, as demonstrated in the reference provided (Lu et al., 2023), a significant reduction in WRKY75 expression is observed in T-DNA insertion mutant alleles of WRKY75. In contrast, Fig. 9B in the current manuscript shows no reduction in WRKY75 expression in the two mutant alleles selected by the authors, which suggests that these alleles cannot be classified as loss-of-function mutants (line 745). Additionally, the authors note that the wrky75 mutant exhibits reduced main root length under salt stress, consistent with the phenotype reported by Lu et al. (2023). However, other phenotypic discrepancies exist between the two studies. For example, 1) Lu et al. (2023) report that w¬rky75 root length is comparable to WT under control conditions, whereas the current manuscript shows that wrky75 root growth is significantly lower than WT; 2) under salt stress, Lu et al. (2023) show that wrky75 accumulates higher levels of Na+, whereas the current study finds Na+ levels in wrky75 indistinguishable from WT. To confirm the loss of WRKY75 function in these T-DNA insertion alleles the authors should provide additional evidence (e.g., Western blot analysis).

    1. eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in others as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments.

      Strengths:

      The manuscript is well-written and conceived around a valid scientific question. The data supports the idea that malnutrition contributes to infection susceptibility and causes some immunological changes. The malnourished mouse model also displayed growth and development delays. The work's significance is well justified. Immunological studies in the malnourished cohort (human and mice) are scarce, so this could add valuable information.

      Weaknesses:

      The assays on myeloid cells are limited, and the study is descriptive and overstated. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I found no cellular mechanism defining the link between nutritional state and immunocompetency.

    3. Reviewer #2 (Public review):

      Summary:

      Sukhina et al. use a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition on the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with appropriate numbers of mice, robust phenotypes, and interesting conclusions, and the text is very well-written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, which is well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), is completely ignored here.

    4. Reviewer #3 (Public review):

      Summary:

      Sukhina et al are trying to understand the impacts of malnutrition on immunity. They model malnutrition with a diet switch from ad libitum to 40% caloric restriction (CR) in post-weaned mice. They test impacts on immune function with listeriosis. They then test whether re-feeding corrects these defects and find aspects of emergency myelopoiesis that remain defective after a precedent period of 40% CR. Overall, this is a very interesting observational study on the impacts of sudden prolonged exposure to less caloric intake.

      Strengths:

      The study is rigorously done. The observation of lasting defects after a bout of 40% CR is quite interesting. Overall, I think the topic and findings are of interest.

      Weaknesses:

      While the observations are interesting, in this reviewer's opinion, there is both a lack of mechanistic understanding of the phenomena and also some lack of resolution/detail about the phenomena itself. Addressing the following major issues would be helpful towards aspects of both:

      (1) Is it calories, per se, or macro/micronutrients that drive these phenotypes observed with 40% CR. At the least, I would want to see isocaloric diets (primarily protein, fat, or carbs) and then some of the same readouts after 40% CR. Ie does low energy with relatively more eg protein prevent immunosuppression (as is commonly suggested)? Micronutrients would be harder to test experimentally and may be out of the scope of this study. However, it is worth noting that many of the malnutrition-associated diseases are micronutrient deficiencies.

      (2) Is immunosuppression a function of a certain weight loss threshold? Or something else? Some idea of either the tempo of immunosuppression (happens at 1, in which weight loss is detected; vs 2-3, when body length and condition appear to diverge; or 5 weeks), or grade of CR (40% vs 60% vs 80%) would be helpful since the mechanism of immunosuppression overall is unclear (but nailing it may be beyond the scope of this communication).

      (3) Does an obese mouse that gets 40% CR also become immunodeficient? As it stands, this ad libitum --> 40% CR model perhaps best models problems in the industrial world (as opposed to always being 40% CR from weaning, as might be more common in the developing world), and so modeling an obese person losing a lot of weight from CR (like would be achieved with GLP-1 drugs now) would be valuable to understanding generalizability.

      (4) Generalizing this phenomenon as "bacterial" with listeriosis, which is more like a virus in many ways (intracellular phase, requires type I IFN, etc.) and cannot be given by the natural route of infection in mice, may not be most accurate. I would want to see an experiment with E.Coli, or some other bacteria, to test the statement of generalizability (ie is it bacteria, or type I IFN-pathway dominant infections, like viruses). If this is unique listeriosis, it doesn't undermine the story as it is at all, but it would just require some word-smithing.

      (5) Previous reports (which the authors cite) implicate Leptin, the levels of which scale with fat mass, as "permissive" of a larger immune compartment (immune compartment as "luxury function" idea). Is their phenotype also leptin-mediated (ie leptin AAV)?

      (6) The inability of re-feeding to "rescue" the myeloid compartment is really interesting. Can the authors do a bone marrow transplantation (CR-->ad libitum) to test if this effect is intrinsic to the CR-experienced bone marrow?

      (7) Is the defect in emergency myelopoiesis a defect in G-CSF? Ie if the authors injected G-CSF in CR animals, do they equivalently mobilize neutrophils? Does G-CSF supplementation (as one does in humans) rescue host defense against Listeria in the CR or re-feeding paradigms?

    1. eLife Assessment

      This study provides a valuable new resource to investigate the molecular basis of the particular features characterizing the pipefish embryo. The authors found both unique and shared gene expression patterns in pipefish organs compared with other teleost fishes. The solid data collected in this unconventional model organism will give new insights into understanding the extraordinary adaptations of the Syngnathidae family and will be of interest in the domain of evolution of fish development.

    2. Reviewer #1 (Public review):

      Syngnathid fishes (seahorses, pipefishes, and seadragons) present very particular and elaborated features among teleosts and a major challenge is to understand the cellular and molecular mechanisms that permitted such innovations and adaptations. The study provides a valuable new resource to investigate the morphogenetic basis of four main traits characterizing syngnathids, including the elongated snout, toothlessness, dermal armor and male pregnancy. More particularly, the authors have focused on a late stage of pipefish organogenesis to perform single-cell RNA-sequencing (scRNA-seq) completed by in situ hybridization analyses to identify molecular pathways implicated in the formation of the different specific traits.<br /> The first set of data explores the scRNA-seq atlas composed of 35,785 cells from two samples of gulf pipefish embryos that authors have been able to classify into major cell types characterizing vertebrate organogenesis, including epithelial, connective, neural and muscle progenitors. To affirm identities and discover potential properties of clusters, authors primarily use KEGG analysis that reveals enriched genetic pathways in each cell types. After revisions, the authors have provided extended supplementary files to well interpret the dataset and some statements have been clarified. I thank the authors for the revisions/completions of ISH results compared to initial submission.

      To conclude, the scRNA-seq dataset in this unconventional model organism will be useful for the community and will provide clues for future research to understand the extraordinary evolution of the Syngnathidae family.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present the first single-cell atlas for syngathid fishes, providing a resource for future evolution & development studies in this group.

      Strengths:

      The concept here is simple and I find the manuscript to be well written. I like the in situ hybridization of marker genes >> this is really nice. I also appreciate the gene co-expression analysis to identify modules of expression. There are no explicit hypotheses tested in the manuscript, but the discovery of these cell types should have value in this organism and in the determination of morphological novelties in seahorses and their relatives.

      Weaknesses:

      I think there are a few computational analyses that might improve the generality of the results.

      (1) The cell types: The authors use marker gene analysis and KEGG pathways to identify cell types. I'd suggest a tool like SAMap (https://elifesciences.org/articles/66747) which compares single cell data sets from distinct organisms to identify 'homologous' cell types -- I imagine the zebrafish developmental atlases could serve as a reasonable comparative reference.

      (2) Trajectory analyses: Authors suggest that their analyses might identify progenitor cell states and perhaps related differentiated states. They might explore cytoTRACE and/or pseudotime-based trajectory analyses to more fully delineate these ideas.

      (3) Cell-cell communication: I think it's very difficult to identify 'tooth primordium' cell types, because cell types won't be defined by organ in this way. for instance dental glia will cluster with other glia, dental mesenchyme will likely cluster with other mesenchymal cell types. so the histology and ISH in most convincing in this regard. having said this, given the known signaling interactions in the developing tooth (and in development generally) the authors might explore cell-cell communication analysis (e.g., CellChat) to identify cell types that may be interacting.

      Comments on revisions:

      I feel essentially the same about this manuscript. it's a useful resource for future experimental forays into this unique system. The team made improvements to deal with comments from other reviewers related to quality of confirmatory in situ hybridization. This is good.

      Regarding their response that one can't use CellChat if you're not working in mice or human, this is inaccurate. the assumption one makes is that ligand-receptor pairs and signaling pathways have conserved functions across animals (vertebrates). It's the same assumption the authors make when using the KEGG pathway to score enrichment of pathways in clusters. CellChat used in fishes in Johnson et al 2023 Nature Communications | ( 2023) 14:4891.

    4. Reviewer #3 (Public review):

      Summary:

      This study established a single-cell RNA sequencing atlas of pipefish embryos. The results obtained identified unique gene expression patterns for pipefish-specific characteristics, such as fgf22 in the tip of the palatoquadrate and Meckel's cartilage, broadly informing the genetic mechanisms underlying morphological novelty in teleost fishes. The data obtained are unique and novel, potentially important in understanding fish diversity. Thus, I would enthusiastically support this manuscript if the authors improve it to generate stronger and more convincing conclusions than the current forms.

      Weakness:

      Regarding the expression of sfrp1a and bmp4 dorsal to the elongating ethmoid plate and surrounding the ceratohyal: Are their expression patterns spatially extended or broader compared to the pipefish ancestor? Is there a much closer species available to compare gene expression patterns with pipefish? Did the authors consider using other species closely related to pipefish for ISH? Sfrp1a and bmp4 may be expressed in the same regions of much more closely related species without face elongation. I understand that embryos of such species are not always accessible, but it is also hard to argue responsible genes for a specific phenotype by only comparing gene expression patterns between distantly related species (e.g., pipefish vs. zebrafish). Due to the same reason, I would not directly compare/argue gene expression patterns between pipefish and mice, although I should admit that mice gene expression patterns are sometimes helpful to make a hypothesis of fish evolution. Alternatively, can the authors conduct ISH in other species of pipefish? If the expression patterns of sfrp1a and bmp4 are common among fishes with face elongation, the conclusion would become more solid. If these embryos are not available, is it possible to reduce the amount of Wnt and BMP signal using Crispr/Cas, MO, or chemical inhibitor? I do think that there are several ways to test the Wnt and/or BMP hypothesis in face elongation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Syngnathid fishes (seahorses, pipefishes, and seadragons) present very particular and elaborated features among teleosts and a major challenge is to understand the cellular and molecular mechanisms that permitted such innovations and adaptations. The study provides a valuable new resource to investigate the morphogenetic basis of four main traits characterizing syngnathids, including the elongated snout, toothlessness, dermal armor, and male pregnancy. More particularly, the authors have focused on a late stage of pipefish organogenesis to perform single-cell RNA-sequencing (scRNA-seq) completed by in situ hybridization analyses to identify molecular pathways implicated in the formation of the different specific traits. 

      The first set of data explores the scRNA-seq atlas composed of 35,785 cells from two samples of gulf pipefish embryos that authors have been able to classify into major cell types characterizing vertebrate organogenesis, including epithelial, connective, neural, and muscle progenitors. To affirm identities and discover potential properties of clusters, authors primarily use KEGG analysis that reveals enriched genetic pathways in each cell types. While the analysis is informative and could be useful for the community, some interpretations appear superficial and data must be completed to confirm identities and properties. Notably, supplementary information should be provided to show quality control data corresponding to the final cell atlas including the UMAP showing the sample source of the cells, violin plots of gene count, UMI count, and mitochondrial fraction for the overall

      dataset and by cluster, and expression profiles on UMAP of selected markers characterizing cluster identities. 

      We thank the reviewer for these suggestions, and have added several figures and supplemental files in response. We added a supplemental UMAP showing the sample that each cell originated (S1). We also added supplemental violin plots for each sample showing the gene count, unique molecular identifier (UMI) count, mitochondrial fraction, and the doublet scores (S2). We added feature plots of zebrafish marker genes for these major cell types and marker genes identified from our dataset to the supplement (S3:S57). We also provided two supplemental files with marker genes. These changes should clarify the work that went into labeling the clusters. Although some of the cluster labels are general, we decided it would be unwise to label clusters with speculated specific annotations. We only gave specific annotations to clusters with concrete markers and/or in situ hybridization (ISH) results that cemented an annotation.  As shown in the new supplemental figures and files, certain clusters had clear, specific markers while others did not. Therefore, we used caution when we annotated clusters without distinct markers. 

      The second set of data aims to correlate the scRNA-seq analysis with in situ hybridizations (ISH) in two different pipefish (gulf and bay) species to identify and characterize markers spatially, and validate cell types and signaling pathways active in them. While the approach is rational, the authors must complete the data and optimize labeling protocols to support their statements. One major concern is the quality of ISH stainings and images; embryos show a high degree of pigmentation that could hide part of the expression profile, and only subparts and hardly detectable tissues/stainings are presented. The authors should provide clear and good-quality images of ISH labeling on whole-mount specimens, highlighting the magnification regions and all other organs/structures (positive controls) expressing the marker of interest along the axis. Moreover, ISH probes have been designed and produced on gulf pipefish genome and cDNA respectively, while ISH labeling has been performed indifferently on bay or gulf pipefish embryos and larvae. The authors should specify stages and species on figure panels and should ensure sequence alignment of the probe-targeted sequences in the two species to validate ISH stainings in the bay pipefish. Moreover, spatiotemporal gene expression being a very dynamic process during embryogenesis, interpretations based on undefined embryonic and larval stages of pipefish development and compared to 3dpf zebrafish are insufficient to hypothesize on developmental specificities of pipefish features, such as on the absence of tooth primordia that could represent a very discrete and transient cell population. The ISH analyses would require a clean and precise spatiotemporal expression comparison of markers at the level of the entire pipefish and zebrafish specimens at well-defined stages, otherwise, the arguments proposed on teleost innovations and adaptations turn out to be very speculative. 

      We are appreciative of the reviewer’s feedback. We primarily used the in situ hybridization (ISH) data as supplementary to the scRNAseq library and we are aware that further evidence is necessary to identify origins of syngnathid’s evolutionary novelties. Our goal was to provide clues for the developmental genetic basis of syngnathid derived features.  We hope that our study will inspire future investigations and are excited for the prospect that future research could include this reviewer’s ideas. 

      All of the developmental stages and species information for the embryos used were in the figure captions as well as in supplemental file 6. Because we primarily used wild caught embryos, we did not have specific ages of most embryos. Syngnathid species are challenging to culture in the laboratory, and extracting embryos requires euthanizing the father which makes it difficult to obtain enough embryos for ISH. In addition, embryos do not survive long when removed from the brood pouch prematurely. We supplemented our ISH with bay pipefish caught off the Oregon coast because these fish have large broods. Wild caught pregnant male bay pipefish were immediately euthanized, and their broods were fixed. Because we did not have their age, we classified them based on developmental markers such as presence of somites and the extent of craniofacial elongation. Although these classification methods are not ideal, they are consistent with the syngnathid literature (Sommer et al. 2012). Since the embryos used for the ISH were primarily wild caught, we had a few different developmental stages represented in our ISH data. For our tooth primordia search, we used embryos from the same brood (therefore, same stage) for these experiments.

      We understand the concern for the degree of pigmentation in the samples. We completed numerous bleach trials before embarking on the in situ hybridization experiments. After completing a bleach trial with a probe created from the gene tnmd for ISH_,_ we noticed that the bleached embryos were missing expression domains found in the unbleached embryos. We were, therefore, concerned that using bleached embryos for our experiments would result incorrect conclusions about the expression domains of these genes. We sparingly used bleaching at older stages, hatched larvae, where it was fundamentally necessary to see staining. As stated above, the primary goal of this manuscript was to generate and annotate the first scRNA-seq atlas in a syngnathid, and the ISHs were utilized to support inferred cluster annotations only through a positive identification of marker gene expression in expected tissues/cells. Therefore, the obscuring of gene expression by pigmentation would have resulted in the absence of evidence for a possible cluster annotation, not an incorrect annotation.

      For the ease of viewing the ISHs, we improved annotations and clarity. We increased the brightness and contrast of images. In the original submission, we had to lower the image resolution to make the submission file smaller. We hope that these improvements plus the true image quality improves clarity of ISH results. We also included alignments in our supplementary files of bay pipefish sequences to the Gulf pipefish probes to showcase the high degree of sequence similarity. 

      Sommer, S., Whittington, C. M., & Wilson, A. B. (2012). Standardised classification of pre-release development in male-brooding pipefish, seahorses, and seadragons (Family Syngnathidae). BMC Developmental Biology, 12, 12–15. 

      To conclude, whereas the scRNA-seq dataset in this unconventional model organism will be useful for the community, the spatiotemporal and comparative expression analyses have to be thoroughly pushed forward to support the claims. Addressing these points is absolutely necessary to validate the data and to give new insights to understand the extraordinary evolution of the Syngnathidae family. 

      We really appreciate the reviewer’s enthusiasm for syngnathid research, and hope that the additional files and explanation of the supporting role of the ISHs have adequately addressed their concerns. We share the reviewer’s enthusiasm and are excited for future work that can extend this study. 

      Reviewer #2 (Public Review):

      Summary: 

      The authors present the first single-cell atlas for syngnathid fishes, providing a resource for future evolution & development studies in this group. 

      Strengths: 

      The concept here is simple and I find the manuscript to be well written. I like the in situ hybridization of marker genes - this is really nice. I also appreciate the gene co-expression analysis to identify modules of expression. There are no explicit hypotheses tested in the manuscript, but the discovery of these cell types should have value in this organism and in the determination of morphological novelties in seahorses and their relatives.  

      We are grateful for this reviewer’s appreciation of the huge amount of work that went into this study, and we agree that the in situ hybridizations (ISHs) support the scRNAseq study as we intended. We appreciate that the reviewer thinks that this work will add value to the syngnathid field.

      Weaknesses: 

      I think there are a few computational analyses that might improve the generality of the results. 

      (1) The cell types: The authors use marker gene analysis and KEGG pathways to identify cell types. I'd suggest a tool like SAMap (https://elifesciences.org/articles/66747) which compares single-cell data sets from distinct organisms to identify 'homologous' cell types - I imagine the zebrafish developmental atlases could serve as a reasonable comparative reference. 

      We appreciate the reviewer’s request, and in fact we would have loved to integrate our dataset with zebrafish. However, syngnathid’s unique craniofacial development makes it challenging to determine the appropriate stage for comparison. While 3 days post fertilization (dpf) zebrafish data were appropriate for comparisons of certain cell types (e.g. epidermal cells), it would have been problematic for other cell types (e.g. osteoblasts) that are not easily detectable until older zebrafish stages. Therefore, determining equivalent stages between these species is difficult and contains potential for error. Future research should focus on trying to better match stages across syngnathids and zebrafish (and other fish species such as stickleback). Studies of this nature promise to uncover the role of heterochrony in the evo-devo of syngnathid’s unique snouts.

      (2) Trajectory analyses: The authors suggest that their analyses might identify progenitor cell states and perhaps related differentiated states. They might explore cytoTRACE and/or pseudotime-based trajectory analyses to more fully delineate these ideas.

      We thank the reviewer for this suggestion! We added a trajectory analysis using cytoTRACE to the manuscript. It complemented our KEGG analysis well (L172-175; S73) and has improved the manuscript.

      (3) Cell-cell communication: I think it's very difficult to identify 'tooth primordium' cell types, because cell types won't be defined by an organ in this way. For instance, dental glia will cluster with other glia, and dental mesenchyme will likely cluster with other mesenchymal cell types. So the histology and ISH is most convincing in this regard. Having said this, given the known signaling interactions in the developing tooth (and in development generally) the authors might explore cell-cell communication analysis (e.g., CellChat) to identify cell types that may be interacting. 

      We agree! It would have been a wonderful addition to the paper to include a cell-cell communication analysis. One limitation of CellChat is that it only includes mouse and human orthologs. Given concerns of reviewer #3 for mouse-syngnathid comparisons, we decided to not pursue CellChat for this study. We are looking forward to future cell communication resources that include teleost fishes.

      Reviewer #3 (Public Review): 

      Summary: 

      This study established a single-cell RNA sequencing atlas of pipefish embryos. The results obtained identified unique gene expression patterns for pipefish-specific characteristics, such as fgf22 in the tip of the palatoquadrate and Meckel's cartilage, broadly informing the genetic mechanisms underlying morphological novelty in teleost fishes. The data obtained are unique and novel, potentially important in understanding fish diversity. Thus, I would enthusiastically support this manuscript if the authors improve it to generate stronger and more convincing conclusions than the current forms. 

      Thank you, we appreciate the reviewer’s enthusiasm!

      Weaknesses: 

      Regarding the expression of sfrp1a and bmp4 dorsal to the elongating ethmoid plate and surrounding the ceratohyal: are their expression patterns spatially extended or broader compared to the pipefish ancestor? Is there a much closer species available to compare gene expression patterns with pipefish? Did the authors consider using other species closely related to pipefish for ISH? Sfrp1a and bmp4 may be expressed in the same regions of much more closely related species without face elongation. I understand that embryos of such species are not always accessible, but it is also hard to argue responsible genes for a specific phenotype by only comparing gene expression patterns between distantly related species (e.g., pipefish vs. zebrafish). Due to the same reason, I would not directly compare/argue gene expression patterns between pipefish and mice, although I should admit that mice gene expression patterns are sometimes helpful to make a hypothesis of fish evolution. Alternatively, can the authors conduct ISH in other species of pipefish? If the expression patterns of sfrp1a and bmp4 are common among fishes with face elongation, the conclusion would become more solid. If these embryos are not available, is it possible to reduce the amount of Wnt and BMP signal using Crispr/Cas, MO, or chemical inhibitor? I do think that there are several ways to test the Wnt and/or BMP hypothesis in face elongation. 

      We appreciate the reviewer’s suggestion, and their recognition for challenges within this system. In response to this comment, we completed further in situ hybridization experiments in threespine stickleback, a short snouted fish that is much more closely related to syngnathids than is zebrafish, to make comparisons with pipefish craniofacial expression patterns (S76-S79). We added ISH data for the signaling genes (fgf22, bmp4, and sfrp1a) as well as prdm16. Through adding this additional ISH results, we speculated that craniofacial expression of bmp4, sfrp1a, and prdm16 is conserved across species. However, compared to the specific ceratohyal/ethmoid staining seen in pipefish, stickleback had broad staining throughout the jaws and gills. These data suggest that pipefish have co-opted existing developmental gene networks in the development of their derived snouts. We added this interpretation to the results and discussion of the manuscript (L244-L248; L262-277; L444-470).

      Recommendations for the authors:  

      Reviewing Editor (Recommendations for the Authors)

      We hope that the eLife assessment, as well as the revisions specified here, prove helpful to you for further revisions of your manuscript. 

      Revisions considered essential: 

      (1) Marker genes and single-cell dataset analyses. While these analyses have been performed to a good standard in broad terms, there is a majority view here that cell type annotations and trajectory analyses can be improved. In particular, there is question about the choice of marker genes for the current annotation. For one it can depend on the use of single marker genes (see tnnti1 example for clusters 17 and 31). Here, we recommend incorporating results from SAMap and trajectory analysis (e.g., cytoTRACE or standard pseudotime).

      Because of the reviewer comments, we became aware that we insufficiently communicated how cell clusters were annotated. We did mention in the manuscript that we did not use single marker genes to annotate clusters, but instead we used multiple marker genes for each cluster for the annotation process. We used both marker genes derived from our dataset and marker genes identified from zebrafish resources for cluster annotation. We chose single marker genes for each cluster for visualization purposes and for in situ hybridizations. However, it is clear from the reviewers’ comments that we needed to make more clear how the annotations were performed. To make this effort more clear in our revision, we included two new supplementary files – one with Seurat derived marker genes and one with marker genes derived from our DotPlot method. We also included extensive supplementary figures highlighting different markers. Using Daniocell, we identified 6 zebrafish markers per major cell type and showed their expression patterns in our atlas with FeaturePlots. We also included feature plots of the top 6 marker genes for each cluster. We hope that the addition of these 40+ plots (S3:S57) to the supplement fully addresses these concerns. 

      We appreciated the suggestion of cytotrace from reviewer #2! We ran cytotrace on three major cell lineages (neural, muscle, and connective; S73) which complemented our KEGG analysis in suggesting an undifferentiated fate for clusters 8, 10, and 16. We chose to not run SAMap because it is a scRNA-seq library integration tool. Although we compared our lectin epidermal findings to 3 dpf zebrafish scRNA-seq data, we did not integrate the datasets out of concern that we could draw erroneous conclusions for other cell types.  Future work that explores this technical challenge may uncover the role of heterochrony in syngnathid craniofacial development. We detail these changes more fully in our responses to reviewers.

      (2) The claims regarding evolutionary novelty and/or the genes involved are considered speculative. In part, this comes from relying too heavily on comparisons against zebrafish, as opposed to more closely related species. For example, the discussion regarding C-type lectin expression in the epidermis and KEGG enrichment (lines 358 - 364) seems confusing. Another good example here is the discussion on sfrp1a (lines 258 - 261). Here, the text seems to suggest craniofacial sfrp1a expression (or specifically ethmoid expression?) is connected to the development of the elongated snout in pipefish. However, craniofacial expression of sfrp1a is also reported in the arctic charr, which the authors grouped into fishes with derived craniofacial structures. Separately, sfrp2 expression was also reported in stickleback fish, for example. Do these different discussions truly support the notion that sfrp1a expression is all that unique in pipefish, rather than that pipefish and zebrafish are only distantly related and that sfrp1a was a marker gene first, and co-opted gene second? The authors should respond to the comments in the public review related to this aspect, and include more informative comparison and discussion. 

      A much more nuanced discussion with appropriate comparisons and caveats would be strongly recommended here.  

      We appreciate this insight and used it as a motivator to complete and add select comparative ISH data to this manuscript. We added in situ hybridization experiments from stickleback fish for craniofacial development genes (sfrp_1a, prdm16, bmp4_, and fgf22; S76-S79).  After adding stickleback ISH to the manuscript, we were able to make comparisons between pipefish and stickleback patterns and draw more informed conclusions (L244-L248; L262-277; L444-470). We added additional nuance to the discussion of the head, tooth (L485-489), and male pregnancy (L358-L391) sections to address concerns of study limitations. We describe in more detail these additional data in response to reviewers.

      (3) In situ hybridization results: as already included above, there is generally weak labeling of species, developmental stages, and other markings that can provide context. The collective feeling here is that as it is currently presented, the ISH results do not go too far beyond simply illustrative purposes. To take these results further, more detailed comparison may be needed. At a minimum, far better labeling can help avoid making the wrong impression. 

      Based on the reviewers’ comments, we made changes to improve ISH clarity and add select comparative ISH findings. ISH was used to further interpretation of the scRNAseq atlas. All the developmental stages and species information for the embryos used were in the figure captions as well as in supplemental file 4. Since we primarily used wild caught embryos, we did not have specific ages of most embryos. The technical challenges of acquiring and staging Syngnathus embryos are detailed above. Because we did not have their age, we classified them based on developmental markers (such as presence of somites and the extent of craniofacial elongation). Although these classification methods are not ideal, they are consistent with the syngnathid literature (Sommer et al. 2012).  

      We followed reviewer #1’s recommendations by adding an annotated graphic of a pipefish head, aligning bay and Gulf pipefish sequences for the probe regions, expanding out our supplemental figures for ISH into a figure for each probe, and improving labeling. These changes improved the description of the ISH experiments and have increased the quality of the manuscript.

      We would have loved to complete detailed comparative studies as suggested, but doing such a complete analysis was not feasible for this study. Therefore, we completed an additional focused analysis. We followed reviewer #3’s idea and added ISHs from threespine stickleback, a short snouted fish, for 4 genes (sfrp1a, prdm16, fgf22, and bmp4). While more extensive ISHs tracking all marker genes through a variety of developmental stages in pipefish and stickleback would have provided crucial insights, we feel that it is beyond the scope of this study and would require a significant amount of additional work. We, thus, primarily interpreted the ISH results as illustrative data points in our discussion. As we state in the response to reviewer 1, the generation and annotation of the first scRNA-seq atlas in a syngnathid is the primary goal of this manuscript.  The ISHs were utilized primarily to support inferred cluster annotations if a positive identification of marker gene expression in expected tissues/cells occurred. 

      Reviewer #1 (Recommendations For The Authors): 

      While the scRNA-seq dataset offers a valuable resource for evo-devo analyses in fish and the hypotheses are of interest, critical aspects should be strengthened to support the claims of the study. 

      Concerning the scRNA-seq dataset, the major points to be addressed are listed below: 

      - Supplementary file 3 reports the single markers used to validate cluster annotations. To confirm cluster identities, more markers specific to each cluster should be highlighted and presented on the UMAP. 

      We recognize the reviewer’s concern and had in reality used numerous markers to annotate the clusters. Based upon the reviewer’s comment we decided to make this clear by creating feature plots for every cluster with the top 6 marker genes. These plots showcase gene specificity in UMAP space. We also added feature plots for zebrafish marker genes for key cell types. Through these changes and the addition of 54 supplementary figures (S3:S57), we hope that it is clear that numerous markers validated cluster identity.

      For example, as clusters 17 and 37 share the same tnnti1 marker, which other markers permit to differentiate their respective identity. 

      This is a fair point. Cluster 17 and 37 both are marked by a tnni1 ortholog.

      Different paralogous co-orthologs mark each cluster (cluster 17: LOC125989146; cluster 37: LOC125970863). In our revision to the above comment, additional (6) markers per cluster were highlighted which should remedy this concern. 

      - L146: the low number of identified cartilaginous cells (only 2% of total connective tissue cells) appears aberrant compared to bone cell number, while Figure 1 presents a welldeveloped cartilaginous skeleton with poor or no signs of ossification. Please discuss this point. 

      We also found this to be interesting and added a brief discussion on this subject to the results section (L147-L149). Single cell dissociations can have variable success for certain cell types. It is possible that the cartilaginous cells were more difficult to dissociate than the osteoblast cells.

      - L162: pax3a/b are not specific to muscle progenitors as the genes are also expressed in the neural tube and neural crest derivatives during organogenesis. Please confirm cluster 10 identity.  

      Thank you for the reminder, we added numerous feature plots that explored zebrafish (from Daniocell) and pipefish markers (identified in our dataset). Examining zebrafish satellite muscle markers (myog, pabpc4, and jam2a) shows a strong correspondence with cluster #10.

      - L198: please specify in the text the pigment cell cluster number. 

      We completed this change.

      - L199: it is not clear why considering module 38 correlated to cluster 20 while modules 2/24 appear more correlated according to the p-value color code. 

      We thank the reviewer for pointing this confusing element out! Although the t-statistic value for module 38 (3.75) is lower than the t-statistics for modules 2 and 24 (5.6 and 5.2, respectively), we chose to highlight module 38 for its ‘connectivity dependence’ score. In our connectivity test, we examined whether removing cells from a specific cell cluster reduced the connectivity of a gene network. We found that removing cluster 20 led to a decrease in module 38’s connectivity (-.13, p=0) while it led to an increase in modules 2 and 24’s connectivity (.145, p=1; .145, p=9.14; our original supplemental files 9-10). Therefore, the connectivity analysis showed that module 38’s structure was more dependent on cluster 20 than in comparison with modules 2 and 24. Although you highlighted an interesting quandary, we decided that this is tangential to the paper and did not add this discussion to the manuscript. 

      - Please describe in the text Figure 4A. 

      Completed, we thank the reviewer for catching this! 

      Concerning embryo stainings, the major points to be addressed are listed below: 

      - Figure 1: please enhance the light/contrast of figures to highlight or show the absence of alcian/alizarin staining. Mineralized structures are hardly detectable in the head and slight differences can be seen between the two samples. The developmental stage should be added. Please homogenize the scale bar format (remove the unit on panels E and, G as the information is already in the text legend). It would be useful to illustrate the data with a schematic view of the structures presented in panels B, and E, and please annotate structures in the other panels.  

      We thank the reviewer for these suggestions to improve our figure. We increased the brightness and contrast for all our images. We also added an illustration of the head with labels of elements. As discussed, we used wild caught pregnant males and, therefore, do not know the exact age of the specimens. However, we described the developmental stage based on morphological observations. Slight differences in morphology between samples is expected. We and others have noticed that

      developmental rate varies, even within the same brood pouch, for syngnathid embryos. We observed several mineralization zones including in the embryos including the upper and lower jaws, the mes(ethmoid), and the pectoral fin. We recognize the cartilage staining is more apparent than the bone staining, though increasing image brightness and contrast did improve the visibility of the mineralization front.

      - All ISH stainings and images presented in Figures 4-6/ Figures S2-3 should be revised according to comments provided in the public review. 

      We thank the reviewer for providing thorough comments, we provided an in-depth response to the public review. We made several improvements to the manuscript to address their concerns. 

      - Figure 4: Figure 4B should be described before 4C in the text or inverse panels / L222 the Meckel's cartilage is not shown on Figure 4C. The schematic views in H should be annotated and the color code described / the ISH data must be completed to correlate spatially clusters to head structures. 

      We thank the reviewer for pointing this out, we fixed the issues with this figure and added annotations to the head schematics.

      - Figure 5: typo on panels 'alician' = alcian. 

      We completed this change. 

      - Figures S2-3: data must be better presented, polished / typo in captions 'relavant'= relevant. 

      Thank you for this critique, we created new supplementary figures to enhance interpretation of the data (S59-S71). In these new figures, we included a feature plot for each gene and respective ISHs.

      - Figure S3: soat2 = no evidence of muscle marker neither by ISH presented nor in the literature. 

      We realized this staining was not clear with the previous S2/S3 figures. Our new changes in these supplementary figures based on the reviewer’s ideas made these ISH results clearer. We observed soat2 staining in the sternohyoideus muscle (panel B in S71).

      Other points: 

      - The cartilage/bone developmental state (Alcian/alizarin staining) and/or ISH for classical markers of muscle development (such as pax3/myf5) could be used to clarify the This could permit the completion of a comparative analysis between the two species and the interpretation of novel and adaptative characters.  

      We appreciate this idea! We thought deeply about a well characterized comparative analysis between pipefish and zebrafish for this study. We discussed our concerns in our public response to reviewer 2. We found that it was challenging to stage match all cell types, and were concerned that we could make erroneous conclusions. For example, our pipefish samples were still inside the male brood pouch and possessed yolk sacs. However, we found osteoblast cells in our scRNAseq atlas, and in alizarin staining. Although zebrafish literature notes that the first zebrafish bone appears at 3 dpf (Kimmel et al. 1995), osteoblasts were not recognized until 5 dpf in two scRNAseq datasets (Fabian et al. 2022; Lange et al. 2023). A 5dpf zebrafish is considered larval and has begun hunting. Therefore, we chose to not integrate our data out of concern that osteoblast development may occur at different timelines between the fishes. 

      Fabian, P., Tseng, K.-C., Thiruppathy, M., Arata, C., Chen, H.-J., Smeeton, J., Nelson, N., & Crump, J. G. (2022). Lifelong single-cell profiling of cranial neural crest diversification in zebrafish. Nature Communications 2022 13:1, 13(1), 1–13. 

      Lange, M., Granados, A., VijayKumar, S., Bragantini, J., Ancheta, S., Santhosh, S., Borja, M., Kobayashi, H., McGeever, E., Solak, A. C., Yang, B., Zhao, X., Liu, Y., Detweiler, A. M., Paul,

      S., Mekonen, H., Lao, T., Banks, R., Kim, Y.-J., … Royer, L. A. (2023). Zebrahub – Multimodal Zebrafish Developmental Atlas Reveals the State-Transition Dynamics of Late-Vertebrate Pluripotent Axial Progenitors. BioRxiv, 2023.03.06.531398. 

      Kimmel, C., Ballard, S., Kimmel, S., Ullmann, B., Schilling, T. (1995). Stages of Embryonic Development of the Zebrafish. Developmental Dynamics 203:253:-310.

      'in situs' in the text should be replaced by 'in situ experiments'.  

      We made this change (L395, L663, L666, L762).

      - Lines 562-565: information on samples should be added at the start of the result section to better apprehend the following scRNA-seq data.

      We thank the reviewer for pointing out this issue. Although we had a few sentences on the samples in the first paragraph of the result section, we understand that it was missing some critical pieces of information. Therefore, we added these additional details to the beginning of the results section (L126-L132). 

      - Lines 629-665: PCR with primers designed on gulf pipefish genome could be performed in parallel on bay and gulf cDNA libraries, and amplification products could be sequenced to analyze alignment and validate the use of gulf pipefish ISH probes in bay pipefish embryos. Probe production could also be performed using gulf primers on bay pipefish cDNA pools. 

      After the submission of this manuscript, a bay pipefish genome was prepared by our laboratory. We used this genome to align our probes, these alignments demonstrate strong sequence conservation between the species. We included these alignments in our supplemental files.

      - L663: the bleaching step must be optimized on pipefish embryos. 

      We understand this concern and had completed several bleach optimization experiments prior to publication. Although we found that bleaching improved visibility of staining, we noticed with the probe tnmd that bleached embryos did not have complete staining of tendons and ligaments. The unbleached embryos had more extensive staining than the bleached embryos. We were concerned that bleaching would lead to failures to detect expression domains (false negatives) important for our analysis. Therefore, we did not use bleaching with our in situs experiments (except with hatched fish with a high degree of pigmentation). 

      - Indicate the number of specimens analyzed for each labeling condition.  

      We thank the reviewer for noticing this issue. We added this information to the methods (L766-767).

      - Describe the fixation and pre-treatment methods previous to ISH and skeleton stainings

      We thank the reviewer for pointing out this issue, we added these descriptions (L765-766; L772-774). 

      Reviewer #3 (Recommendations For The Authors): 

      (1) If sfrp1a expression is observed also in other fish species with derived craniofacial structures, it's important to discuss this more in the Discussion. This could be a common mechanism to modify craniofacial structures, although functional tests are ultimately required (but not in this paper, for sure). Can lines 421-428 involve the statement "a prolonged period of chondrocyte differentiation" underlies craniofacial diversity?

      This is a great idea, and we added a sentence that captures this ethos (L451-452).

      (2) Lines 334-346 need to be rephrased. It's hard to understand which genes are expressed or not in pipefish and zebrafish. Did "23 endocytosis genes" show significant enrichment in zebrafish epidermis, or are they expressed in zebrafish epidermis? 

      We thank the reviewer for this comment, we re-phrased this section for clarity (L365-368).

      (3) Figure 4 is missing the "D" panel and two "E" panels. 

      We thank the reviewer for noticing this, we fixed this figure.

      (4) Line 302: "whole-mount" or "whole mount"

      We thank the reviewer for the catch!

    1. eLife Assessment

      This important study investigates how working memory load influences the Stroop effect from a temporal dynamics perspective. Solid evidence is provided that the working memory load influences the Stroop effect in the late-stage stimulus-response mapping instead of the early sensory stage. This study will be of interest to both neuroscientists and psychologists who work on cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates an intriguing question in cognitive control from a temporal dynamics perspective: why does concurrent verbal working memory load eliminate the color-word Stroop effect? Through a series of thorough data analyses, the authors propose that verbal working memory load occupies the stimulus-response mapping resources represented by theta-band activity, thereby disrupting the mapping process for task-irrelevant distractors. This reduces the response tendency to the distractors, ultimately leading to the elimination of the Stroop effect.

      Strengths:

      The behavioral and neural evidence presented in the manuscript is solid, and the findings have valuable theoretical implications for research on Stroop conflict processing.

      Weaknesses:

      There are several areas where the manuscript could be improved.

      Major Comments:

      (1) In the Results section, the rationale behind selecting the beta band for the central (C3, CP3, Cz, CP4, C4) regions and the theta band for the fronto-central (Fz, FCz, Cz) regions is not clearly explained in the main text. This information is only mentioned in the figure captions. Additionally, why was the beta band chosen for the S-ROI fronto-central region and the theta band for the S-ROI central region? Was this choice influenced by the MVPA results?

      (2) In the Data Analysis section, line 424 states: "Only trials that were correct in both the memory task and the Stroop task were included in all subsequent analyses. In addition, trials in which response times (RTs) deviated by more than three standard deviations from the condition mean were excluded from behavioral analyses." The percentage of excluded trials should be reported. Also, for the EEG-related analyses, were the same trials excluded, or were different criteria applied?

      (3) In the Methods section, line 493 mentions: "A 400-200 ms pre-stimulus time window was selected as the baseline time window." What is the justification in the literature for choosing the 400-200 ms pre-stimulus window as the baseline? Why was the 200-0 ms pre-stimulus period not considered?

      (4) Is the primary innovation of this study limited to the methodology, such as employing MVPA and RSA to establish the relationship between late theta activity and behavior?

      (5) On page 14, lines 280-287, the authors discuss a specific pattern observed in the alpha band. However, the manuscript does not provide the corresponding results to substantiate this discussion. It is recommended to include these results as supplementary material.

      (6) On page 16, lines 323-328, the authors provide a generalized explanation of the findings. According to load theory, stimuli compete for resources only when represented in the same form. Since the pre-memorized Chinese characters are represented semantically in working memory, this explanation lacks a critical premise: that semantic-response mapping is also represented semantically during processing.

      (7) The classic Stroop task includes both a manual and a vocal version. Since stimulus-response mapping in the vocal version is more automatic than in the manual version, it is unclear whether the findings of this study would generalize to the impact of working memory load on the Stroop effect in the vocal version.

      (8) While the discussion section provides a comprehensive analysis of the study's results, the authors could further elaborate on the theoretical and practical contributions of this work.

    3. Reviewer #2 (Public review):

      Summary:

      Li et al. explored which stage of Stroop conflict processing was influenced by working memory loads. Participants completed a single task (Stroop task) and a dual task (the Sternberg working memory task combined with the Stroop task) while their EEG data was recorded. They adopted the event-related potential (ERP), and multivariate pattern analyses (MVPA) to investigate the interaction effect of task (single/dual) and congruency (congruent/incongruent). The results showed that the interaction effect was significant on the sustained potential (SP; 650-950 ms), the late theta (740-820 ms), and beta (920-1040 ms) power but not significant on the early P1 potential (110-150 ms). They used the representational similarity analyses (RSA) method to explore the correlation between behavioral and neural data, and the results revealed a significant contribution of late theta activity.

      Strengths:

      (1) The experiment is well-designed.

      (2) The data were analyzed in depth from both time and frequency domain perspectives by combining several methods.

      Weaknesses:

      (1) As the researchers mentioned, a previous study reported a diminished Stroop effect with concurrent working memory tasks to memorize meaningless visual shapes rather than memorize Chinese characters as in the study. My main concern is that lower-level graphic processing when memorizing visual shapes also influences the Stroop effect. The stage of Stroop conflict processing affected by the working memory load may depend on the specific content of the concurrent working memory task. If that's the case, I sense that the generalization of this finding may be limited.

      (2) The P1 and N450 components are sensitive to congruency in previous studies as mentioned by the researchers, but the results in the present study did not replicate them. This raised concerns about data quality and needs to be explained.

    4. Author response:

      Reviewer #1 (Public review):

      Comment 1: In the Results section, the rationale behind selecting the beta band for the central (C3, CP3, Cz, CP4, C4) regions and the theta band for the fronto-central (Fz, FCz, Cz) regions is not clearly explained in the main text. This information is only mentioned in the figure captions. Additionally, why was the beta band chosen for the S-ROI central region and the theta band for the S-ROI fronto-central region? Was this choice influenced by the MVPA results?

      We thank the reviewer for the question regarding the rationale for the S-ROI selection in our study. The beta band was chosen for the central region due to its established relevance in motor control (Engel & Fries, 2010), movement planning (Little et al., 2019) and motor inhibition (Duque et al., 2017). The fronto-central theta band (or frontal midline theta) was a widely recognized indicator in cognitive control research (Cavanagh & Frank, 2014), associated with conflict detection and resolution processes. Moreover, recent empirical evidence suggested that the fronto-central theta reflected the coordination and integration between stimuli and responses (Senoussi et al., 2022). Although we have described the cognitive processes linked to these different frequencies in the introduction and discussion sections, along with the potential patterns of results observed in Stroop-related studies, we did not specify the involved cortical areas. Therefore, we have specified these areas in the introduction to enhance the clarity of the revised version (in the fourth paragraph of the Introduction section).

      Regarding whether the selection of S-ROIs was influenced by the MVPA results, we would like to clarify here that we selected the S-ROIs based on prior research and then conducted the decoding analysis. Specifically, we first extracted the data representing different frequency indicators (three F-ROIs and three S-ROIs) as features, followed by decoding to obtain the MVPA results. Subsequently, the time-frequency analysis, combined with the specific time windows during which each frequency was decoded, provided detailed interaction patterns among the variables for each indicator. The specifics of feature selection are described in the revised version (in the first paragraph of the Multivariate Pattern Analysis section).

      Comment 2: In the Data Analysis section, line 424 states: “Only trials that were correct in both the memory task and the Stroop task were included in all subsequent analyses. In addition, trials in which response times (RTs) deviated by more than three standard deviations from the condition mean were excluded from behavioral analyses.” The percentage of excluded trials should be reported. Also, for the EEG-related analyses, were the same trials excluded, or were different criteria applied?

      We thank the reviewer for this suggestion. Beyond the behavioral exclusion criteria, trials with EEG artifacts were also excluded from the data for the EEG-related analyses. We have now reported the percentage of excluded trials for both behavioral and EEG data analyses in the revised version (in the second paragraph of the EEG Recording and Preprocessing section and the first paragraph of the Behavioral Analysis section).

      Comment 3: In the Methods section, line 493 mentions: “A 400-200 ms pre-stimulus time window was selected as the baseline time window.” What is the justification in the literature for choosing the 400-200 ms pre-stimulus window as the baseline? Why was the 200-0 ms pre-stimulus period not considered?

      We thank the reviewer for this question and would like to provide the following justification. First, although a baseline ending at 0 ms is common in ERP analyses, it may not be suitable for time-frequency analysis. Due to the inherent temporal smoothing characteristic of wavelet convolution in time-frequency decomposition, task-related early activities can leak into the pre-stimulus period (before 0 ms) (Cohen, 2014). This means that extending the baseline to 0 ms will include some post-stimulus activity in the baseline window, thereby increasing baseline power and compromising the accuracy of the results. Second, an ideal baseline duration is recommended to be around 10-20% of the entire trial of interest (Morales & Bowers, 2022). In our study, the epoch duration was 2000 ms, making 200-400 ms an appropriate baseline length. Third, given that the minimum duration of the fixation point before the stimulus in our experiment was 400 ms, we chose the 400 ms before the stimulus as the baseline point to ensure its purity. In summary, considering edge effects, duration requirements, and the need to exclude other influences, we selected a baseline correction window of -400 to -200 ms. To enhance the clarity of the revised version, we have provided the rationale for the selected time windows along with relevant references (in the first paragraph of the Time-frequency analysis section).

      Comment 4: Is the primary innovation of this study limited to the methodology, such as employing MVPA and RSA to establish the relationship between late theta activity and behavior?

      We thank the reviewer for this insightful question and would like to clarify that our research extends beyond mere methodological innovation; rather, it utilized new methods to explore novel theoretical perspectives. Specifically, our research presents three levels of innovation: methodological, empirical, and theoretical. First, methodologically, MVPA overcame the drawbacks of traditional EEG analyses based on specific averaged voltage intensities, providing new perspectives on how the brain dynamically encoded particular neural representations over time. Furthermore, RSA aimed to identify which indicators among the decoded were directly related to behavioral representation patterns. Second, in terms of empirical results, using these two methods, we have identified for the first time three EEG markers that modulate the Stroop effect under verbal working memory load: SP, late theta, and beta, with late theta being directly linked to the elimination of the behavioral Stroop effect. Lastly, from a theoretical perspective, we proposed the novel idea that working memory played a crucial role in the late stages of conflict processing, specifically in the stimulus-response mapping stage (the specific theoretical contributions are detailed in the second-to-last paragraph of the Discussion section).

      Comment 5: On page 14, lines 280-287, the authors discuss a specific pattern observed in the alpha band. However, the manuscript does not provide the corresponding results to substantiate this discussion. It is recommended to include these results as supplementary material.

      We thank the reviewer for this suggestion. We added a new figure along with the corresponding statistical results that displayed the specific result patterns for the alpha band (Supplementary Figure 1).

      Comment 6: On page 16, lines 323-328, the authors provide a generalized explanation of the findings. According to load theory, stimuli compete for resources only when represented in the same form. Since the pre-memorized Chinese characters are represented semantically in working memory, this explanation lacks a critical premise: that semantic-response mapping is also represented semantically during processing.

      We thank the reviewer for this insightful suggestion. We fully agree with the reviewer’s perspective. As stated in our revised version, load theory suggests that cognitive resources are limited and dependent on a specific type (in the second paragraph of the Discussion section). The previously memorized Chinese characters are stored in working memory in the form of semantic representations; meanwhile the stimulus-response mapping should also be represented semantically, leading to resource occupancy. We have included this logical premise in the revised version (in the third-to-last paragraph of the Discussion section).

      Comment 7: The classic Stroop task includes both a manual and a vocal version. Since stimulus-response mapping in the vocal version is more automatic than in the manual version, it is unclear whether the findings of this study would generalize to the impact of working memory load on the Stroop effect in the vocal version.

      We fully agree with the reviewer’s point that the verbal version of the Stroop task differs from the manual version in terms of the degree of automation in the stimulus-response mapping. Specifically, the verbal version relies on mappings that are established through daily language use, while the manual version involves arbitrary mappings created in the laboratory. Therefore, the stimulus-response mapping in the verbal response version is more automated and less likely to be suppressed. However, our previous research indicated that the degree of automation in the stimulus-response mapping was influenced by practice (Chen et al., 2013). After approximately 128 practice trials, semantic conflict almost disappears, suggesting that the level of automation in stimulus-response mapping for the verbal Stroop task is comparable to that of the manual version (Chen et al., 2010). Given that participants in our study completed 144 practice trials (in the Procedure section), we believe these findings can be generalized to the verbal version.

      Comment 8: While the discussion section provides a comprehensive analysis of the study’s results, the authors could further elaborate on the theoretical and practical contributions of this work.

      We thank the reviewer for the constructive suggestions. We recognize that the theoretical and practical contributions of the study were not thoroughly elaborated in the original manuscript. Therefore, we have now provided a more detailed discussion. Specifically, the theoretical contributions focus on advancing load theory and highlighting the critical role of working memory in conflict processing. The practical contributions emphasize the application of load theory and the development of intervention strategies for enhancing inhibitory control. A more detailed discussion can be found in the revised version (in the second-to-last paragraph of the Discussion section).

      Reviewer #2 (Public review):

      Comment 1: As the researchers mentioned, a previous study reported a diminished Stroop effect with concurrent working memory tasks to memorize meaningless visual shapes rather than memorize Chinese characters as in the study. My main concern is that lower-level graphic processing when memorizing visual shapes also influences the Stroop effect. The stage of Stroop conflict processing affected by the working memory load may depend on the specific content of the concurrent working memory task. If that’s the case, I sense that the generalization of this finding may be limited.

      We thank the reviewer for this insightful concern. As mentioned in the manuscript, this may be attributed to the inherent characteristics of Chinese characters. In contrast to English words, the processing of Chinese characters relies more on graphemic encoding and memory (Chen, 1993). Therefore, the processing of line patterns essentially occupies some of the resources needed for character processing, which aligns with our study’s hypothesis based on dimensional overlap. Additionally, regarding the results, even though the previous study presents lower-level line patterns, the results still showed that the working memory load modulated the later theta band. We hypothesize that, regardless of the specific content of the pre-presented working memory load, once the stimulus disappears from view, these loads are maintained as representations in the working memory platform. Therefore, they do not influence early perceptual processing, and resource competition only occurs once the distractors reach the working memory platform. Lastly, previous study has shown that spatial loads, which do not overlap with either the target or distractor dimensions, do not influence conflict effect (Zhao et al., 2010). Taken together, we believe that regardless of the specific content of the concurrent working memory tasks, as long as they occupy resources related to irrelevant stimulus dimensions, they can influence the late-stage processing of conflict effect. Perhaps our original manuscript did not convey this clearly, so we have rephrased it in a more straightforward manner (in the second paragraph of the Discussion section).

      Comment 2: The P1 and N450 components are sensitive to congruency in previous studies as mentioned by the researchers, but the results in the present study did not replicate them. This raised concerns about data quality and needs to be explained.

      We thank the reviewer for this insightful concern. For P1, we aimed to convey that the early perceptual processing represented by P1 is part of the conflict processing process. Therefore, we included it in our analysis. Additionally, as mentioned in the discussion, most studies find P1 to be insensitive to congruency. However, we inappropriately cited a study in the introduction that suggested P1 shows differences in congruency, which is among the few studies that hold this perspective. To prevent confusion for readers, we have removed this citation from the introduction.

      As for N450, most studies have indeed found it to be influenced by congruency. In our manuscript, we did not observe a congruency effect at our chosen electrodes and time window. However, significant congruency effects were detected at other central-parietal electrodes (CP3, CP4, P5, P6) during the 350-500 ms interval. The interaction between task type and consistency remained non-significant, consistent with previous results. Furthermore, with respect to the location of the electrodes chosen, existing studies on N450 vary widely, including central-parietal electrodes and frontal-central electrodes (for a review, see Heidlmayr et al., 2020). We speculate that this phenomenon may be related to the extent of practice. With fewer total trials, the task may involve more stimulus conflicts, engaging more frontal brain areas. On the other hand, with more total trials, the task may involve more response conflicts, engaging more central-parietal brain areas (Chen et al., 2013; van Veen & Carter, 2005). Due to the extensive practice required in our study, we identified a congruency N450 effect in the central-parietal region. We apologize for not thoroughly exploring other potential electrodes in the previous manuscript, and we have revised the results and interpretations regarding N450 accordingly in the revised version (in the N450 section of the ERP results and the third paragraph of the Discussion section).

      Reference

      Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences, 18(8), 414–421. https://doi.org/10.1016/j.tics.2014.04.012

      Chen, M. J. (1993). A Comparison of Chinese and English Language Processing. In Advances in Psychology (Vol. 103, pp. 97–117). North-Holland. https://doi.org/10.1016/S0166-4115(08)61659-3

      Chen, X. F., Jiang, J., Zhao, X., & Chen, A. (2010). Effects of practice on semantic conflict and response conflict in the Stroop task. Psychol. Sci., 33, 869–871.

      Chen, Z., Lei, X., Ding, C., Li, H., & Chen, A. (2013). The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage, 66, 577–584. https://doi.org/10.1016/j.neuroimage.2012.10.028

      Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. The MIT Press. https://doi.org/10.7551/mitpress/9609.001.0001

      Duprez, J., Gulbinaite, R., & Cohen, M. X. (2020). Midfrontal theta phase coordinates behaviorally relevant brain computations during cognitive control. NeuroImage, 207, 116340. https://doi.org/10.1016/j.neuroimage.2019.116340

      Duque, J., Greenhouse, I., Labruna, L., & Ivry, R. B. (2017). Physiological Markers of Motor Inhibition during Human Behavior. Trends in Neurosciences, 40(4), 219–236. https://doi.org/10.1016/j.tins.2017.02.006

      Engel, A. K., & Fries, P. (2010). Beta-band oscillations—Signalling the status quo? Current Opinion in Neurobiology, 20(2), 156–165. https://doi.org/10.1016/j.conb.2010.02.015

      Heidlmayr, K., Kihlstedt, M., & Isel, F. (2020). A review on the electroencephalography markers of Stroop executive control processes. Brain and Cognition, 146, 105637. https://doi.org/10.1016/j.bandc.2020.105637

      Little, S., Bonaiuto, J., Barnes, G., & Bestmann, S. (2019). Human motor cortical beta bursts relate to movement planning and response errors. PLOS Biology, 17(10), e3000479. https://doi.org/10.1371/journal.pbio.3000479

      Morales, S., & Bowers, M. E. (2022). Time-frequency analysis methods and their application in developmental EEG data. Developmental Cognitive Neuroscience, 54, 101067. https://doi.org/10.1016/j.dcn.2022.101067

      Senoussi, M., Verbeke, P., Desender, K., De Loof, E., Talsma, D., & Verguts, T. (2022). Theta oscillations shift towards optimal frequency for cognitive control. Nature Human Behaviour, 6(7), Article 7. https://doi.org/10.1038/s41562-022-01335-5

      van Veen, V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage, 27(3), 497–504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Zhao, X., Chen, A., & West, R. (2010). The influence of working memory load on the Simon effect. Psychonomic Bulletin & Review, 17(5), 687–692. https://doi.org/10.3758/PBR.17.5.687

    1. eLife Assessment

      This study presents useful albeit preliminary findings on transcriptome changes in cardiac lymphatic cells after myocardial infarction in mice. Despite revision, the conclusions of the authors remain uncertain as sample sizes in general are very low, and even sometimes too low to allow for valid statistical comparisons. Accordingly, there are concerns regarding statistical robustness, raised by both the editors and the reviewers. While the single-cell transcriptomic data were analyzed using solid advanced methodology, too few cells were included in the scRNA-seq data set and the spatial transcriptomics analyses. Thus, this study rather represents more a collection of preliminary transcriptomic data than a full scientific report that would definitively advance the field.

    2. Reviewer #1 (Public review):

      Summary:

      Assessment of cardiac LEC transcriptomes post-MI may yield new targets to improve lymphatic function. scRNAseq is a valid approach as cardiac LECs are rare compared to blood vessel endothelial cells.

      Strengths:

      Extensive bioinformatics approaches employed by the group

      Weaknesses:

      Too few cells included in scRNAseq data set and the spatial transcriptomics data that was exploited has little relevance, or rather specificity, for cardiac lymphatics. This study seems more a collection of preliminary transcriptomic data than a true scientific report to help advance the field.

      Comments on revisions:

      Thank you for the revision that helps clarify some outstanding questions.

      (1) I still have questions relating to the relevance of the spatial maps generated and shown in fig 3C. They are supposedly generated using a 'molecular finger print' specific to each sub-cluster of LECs. However, given that at early stages postMI most populations are exceedingly rare in your analyses, could you please explain or comment on the relevance of the spatial maps?

      (2) Fig 3 s1 would indicate that the population CaII is the majoritarian one in healthy hearts, while quantifications in Fig 3A show that rather the LEC Co subpopulation is majoritarian. Further, in mouse hearts histological analyses have demonstrated that cardiac lymphatics are restricted to the outer layers of the heart. This is not seen in your spatial maps. This seems to be the case only for the LEc Co population in healthy hearts, but not for other subpopulation signatures. Please explain.

      (3) Further, the population of CaI, with 1 cell analysed in d3, but appears very prevalent in the spatial maps at d3. Please explain.

      (4) In your list of 12 genes used as matrix anchors to identify LEC subpopulations in your screens, it is not apparent how LEC CaI, II and III differ so much as to allow selective detection of subpopulations. This similitude of profiles is supported by Fig 2F, and further explanations are needed to explain how the spatial maps of LEC ca subpopulations appear as distinct as shown in fig 3 S1 and Fig 3C.

    3. Reviewer #2 (Public review):

      Summary:

      This study integrated single-cell sequencing and spatial transcriptome data from mouse heart tissue at different time points post-MI. They identified four transcriptionally distinct subtypes of lymphatic endothelial cells and localized them in space. They observed that LECs subgroups are localized in different zones of infarcted heart with functions. Specifically, they demonstrated that LEC ca III may be involved in directly regulating myocardial injuries in the infarcted zone concerning metabolic stress, while LEC ca II may be related to the rapid immune inflammatory responses of the border zone in the early stage of MI. LEC ca I and LEC collection mainly participate in regulating myocardial tissue edema resolution in the middle and late stages post-MI. Finally, cell trajectory and Cell-Chat analyses further identified that LECs may regulate myocardial edema through Aqp1, and likely affect macrophage infiltration through the galectin9-CD44 pathway. The authors concluded that their study revealed the dynamic transcriptional heterogeneity distribution of LECs in different regions of the infarcted heart and that LECs formed different functional subgroups that may exert different bioeffects in myocardial tissue post-MI.

      Strengths:

      The study addresses a significant clinical challenge, and the results are of great translational value. All experiments were carefully performed, and their data support the conclusion.

    4. Editors' comments (Public review):

      Weaknesses:

      (1) Figure 7C, 7E, 7I, and "Figure7-figure supplement 1 ": All data in these data panels are based on only n=3, which is insufficient. Sample sizes of n=3 are too low to correctly assess normality of distribution and, as a consequence, do not allow to select the appropriate parametric/non-parametric tests. Accordingly, no statistical comparison can be performed and all p values and symbols currently indicating statistically significant differences between groups must be removed.

      (2) Figure 3A, 3B, or 3C: No information about n numbers per group. Should n numbers per group be n=4 or less, no statistical comparison can be performed and all p values and symbols indicating statistically significant differences between groups must be removed.

      (3) Figure 4 E and 4F: No information about n numbers per group. Should n numbers per group be n=4 or less, no statistical comparison can be performed and all p values and symbols indicating statistically significant differences between groups must be removed.

      (4) Figure 5: No information about n numbers per group is provided. Should n numbers per group be n=4 or less, no statistical comparison can be performed and all p values and symbols indicating statistically significant differences between groups must be removed.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for The Authors):

      Q1: In response to reviewers you noted totally 292 sequenced LECs, however in reviewer figure 3 B the numbers seem to add up to 221. Please include mention of the total number of LEC sequences. Please mention line 119, page 4 the total number of explored LEC transcriptomes

      Thank you for your carefully review. We have updated Fig 2A, 2C and 2E. It was 242 (not 292) LECs included in our initial analysis, which contains the sample of d5 post MI in raw data (E-MTAB-7895). We dropped d5 in our subsequent analysis because the change in d5 did not significant differ from d3. Therefore, we included 221 LECs in our final analysis as we updated in Fig2A, 2C and 2E.

      Q2-1: Figure 3A supposedly shows % of LEC subpopulations relative to their numbers found in day 0 samples. However, there seem to be some errors, because for example the subpop LEC Cap I include 13 cells day 1 and 6 cells day 1, which corresponds to 46% of initial numbers. However, from your graph 3B the blue population seems to occupy 10%. Please revise or explain how these relative % were calculated.

      Thank you for your question. In the Figure 3A, each column was calculated by dn/d0*100%, that is d0=57/57*100%=100%, and d1= 21/57*100%=36.84%, d3=9/57*100%=15.79%, d7, d14, d28...Therefor, Cap I in d0 (13 cells) is 13/57*100%=22.81%, and Cap I in d1(6 cells) is 6/57*100%= 10.53%.

      Q2-2: Further, based on the relative % of LEC subpopulations, using the numbers mentioned in Fig 3B, it would appear that the relative frequency LEC cap II population is actually stable at around 20-30% of all LECs per time point throughout the study (except day 1 drop). This contrasts with line 136 p. 4 statement. I would also urge caution for interpreting too much into the variation of relative levels of LEC co, as these represent exceeding rare cells in your samples, and could reflect technical issues rather than true biological variation (total LEC co numbers analyzed ranging from 1-24 cells/ time point). The same could be said of LEC cap II and cap III.

      We strongly agree with your comment on the proportion of LEC cell subtypes post MI. As you pointed out, we have revised the result description on Page 4, line 137-143 as followed.

      “In the early stages of myocardial infarction (D1 and D3), the quantity of LECs decreased sharply. The number of LECs gradually increasing from day 7 and returning to normal levels by day 14 after MI. Moreover, from day 14 onwards, the number and proportion of Ca I type LECs significantly increased.”

      Q3: Please list in supplement the gene features used to identify in spatial transcriptomics the different LEC subpopulations, as their profiles (notably for capillary LECs) don't appear to be very different based on data in Fig 2F.

      We have supplied gene features in supplementary materials.

      Q4: In section 2.7 you refer to Gal9 secretion. Please replace with expression as no measure of protein levels from LECs has been described in your study.

      Thank you for your suggestion, we have replaced secretion with expression.

      Q5: The updated method to exclude non-lymphatic cells from lymphatic vessel analyses by incorporating pdpn as an additional marker ('present costained areas wherever possible' line 350 p 10)

      Thank you for your correction. We have updated the description as follows and lighted them in the manuscript: rabbit anti-Lyve1 (1:300, ab14917, Abcam, UK), [Syrian hamster anti-Podoplanin (1:100, 53-5381-82, Thermo, USA), rabbit anti-Prox1(1:300, ab199359, Abcam, UK), both anti-podoplain and anti-prox1 are additional markers co-stained with Lyve1 to exclude non-lymphatic cells from lymphatic vessel].

      Q6: Fig 1B, it is highly surprising to see the lymphatic density in the BZ go from 25 um² at day 3 to more than 1000 um² only four days later (day 7). Is it possible that your day 3 measurements were in the infarct area, and not BZ area? The H&E image shown in Fig1a for d3 sample would seem to indicate the analysis was done in a dead area, rather than BZ. Please revise (perhaps select similar zone as shown for d1 in fig 6D, adjusted for subepicardial region and not mid-myocardial as seems to be the case currently), and also provide lymphatic area measures in healthy myocardium for day 0 samples. The unit used (um²) also would depend on the size of the area examined. Is this unit per image? If so please report total imaged area as a reference.

      A6: Thank you for your reminding and advises. We have labeled each zone on H&E and IF images in Fig1-supplementary Fig2B, and updated a clearer histological photo taken at 3 days post MI in Fig1A. Furthermore, we recalculated the lymphatic vessel area ratio as you suggested by calculating the ratio of LEC co-stained area to total imaged area under 100-fold magnification.

      Q7: The mention that CD68 antibody isn't compatible with lyve1 antibody could easily have been bridged by using other macrophage markers, such as F4/80, which is readily available and often used marker for macs in mice and comes notably as a rat anti-mouse F4-80. It would have added much more relevant information to exclude Lyve1-/F4/80+ cells as compared to the current analysis, which may indeed include in area measures Lyve1+ /Pdpn- single cells erroneously spotted as 'lymphatic vessels'

      Thank you for your excellent suggestion. We co-stained the sample with F4/80 and LYVE1 and supplied in the Fig1-supplementary Figure 1E, as shown in Author response image 1.

      Author response image 1.

      Immunofluorescence (IF) co-staining of tissue section with F4/80 and LYVE1 in sham and MI mice model at d3, d7, d14, and d28 post-MI. LYVE1: lymphatic vessel endothelial hyaluronan receptor 1; DAPI: 4’6-diamidino-2-phenylindole; scale bar in 10×-100 μm, 40×-25μm.

      Reviewer 2 (Recommendations for The Authors):

      Q1: Language expression must be improved. Many incomplete sentences exist throughout the manuscript. A few examples: Line 70-71: In order to further elucidate the effects and regulatory mechanisms of the lymphatic vessels in the repair process of myocardial injury following MI. Line 71-73. This study, integrated single-cell sequencing and spatial transcriptome data from mouse heart tissue at different timepoints after MI from publicly available data (E-MTAB-7895, GSE214611) in the ArrayExpress and gene expression omnibus (GEO) databases. Line 88-89: Since the membrane protein LYVE1 can present lymphatic vessel morphology more clearly than PROX1.

      Thank you for your correction. We have carefully inspected and corrected the whole manuscript.

      Q2: The type of animal models (i.e., permanent MI or MI plus reperfusion) included in Array Express and gene expression omnibus (GEO) databases must be clearly defined as these two models may have completely different effects on lymphatic vessel development during post-MI remodeling.

      Thank you for your excellent suggestion. The animal models used in both E-MTAB-7895 and GSE214611 are permanent MI. We have modified the model information in the methodology section (page 12, line 400-401).

      Q3: Line 119-120: Caution must be taken regarding Cav1 as a lymphocyte marker because Cav1 is expressed in all endothelial cells, not limited to LEC.

      Thanks for your reminding. Cav 1 used in our clustering is one of the marker gene for its different expression in sub-types of LECs, referred in article PMID: 31402260

      Q4: Figure 1 legend needs to be improved. RZ, BZ, and IZ need to be labeled in all IF images. Day 0 images suggest that RZ is the tissue section from the right ventricle.

      Thank you for your suggestion. We have labeled and updated the regions of RZ, BZ, and IZ in H&E and IF image in Figure1-Figure supplement 2B.

      Q5: The discussion section needs to be improved and better focused on the findings from the current study.

      Thank you for your good comment. Based on your suggestion, we have revised the first paragraph of the discussion from lines 250-256 (Page 7) as followed:

      Cardiac lymphatics play an important role in myocardial edema and inflammation. This study, for the first time, integrated single-cell sequencing data and spatial transcriptome data from mouse heart tissue at different time points of post-MI, and identified four transcriptionally distinct subtypes of LECs and their dynamic transcriptional heterogeneity distribution in different regions of myocardial tissue post-MI. These subgroups of LECs were shown to form different function involved in the inflammation, apoptosis, ferroptosis, and water absorption related regulation of vasopressin during the process of myocardial repair after MI.

    1. eLife Assessment

      This important study presents a series of results aimed at uncovering the involvement of the endosomal sorting protein SNX4 in neurotransmitter release. While the evidence supporting the conclusions is solid, the molecular mechanisms remain unclear. This paper will be of interest to cell biologists and neurobiologists.

    2. Reviewer #1 (Public review):

      Summary:

      In the work Josse Poppinga and collaborators addressed the synaptic function of Sortin-Nexin 4 (SNX4). Employing a newly-developed in vitro KO model, with live imaging experiments, electrophysiological recordings and ultrastructural analysis, the authors evaluate modifications in synaptic morphology and function upon loss of SNX4. The data demonstrate increased neurotransmitter release and alteration in synapse ultrastructure with higher number of docked vesicles and shorter AZ. The evaluation of presynaptic function of SNX4 is of relevance and tackles an open and yet unresolved question in the field of presynaptic physiology.

      Strengths:

      The sequential characterization of the cellular model is nicely conducted, and the different techniques employed are appropriate for the morpho-functional analysis of the synaptic phenotype and the derived conclusions on SNX4 function at presynaptic site. The authors succeeded in presenting a novel in vitro model that results in chronic deletion of SNX4 in neurons. A convincing sequence of experimental techniques are applied to the model to unravel the role of SNX4, whose functions in neuronal cells and at synapses are largely unknown. The understanding of the role of endosomal sorting at presynaptic site is relevant and of high interest in the field of synaptic physiology and on the pathophysiology of the many described synaptopathies that broadly result in loss of synaptic fidelity and quality control at release sites.

      Weaknesses:

      The flow of the data presentation is mostly descriptive with several consistent morphological and functional modifications upon SNX loss. The paper would benefit from a wider characterization that would allow to address the physiological roles of SNX4 at synaptic site and speculate on the underlying molecular mechanisms. The novel experiments on autophagy progression as well as spontaneous neurotransmission are well conducted, although do not assist for the explanation of the molecular mechanism underneath.

      Comments on revisions:

      Other implementations in the revised version are quite limited and would benefit from a more detailed presentation and description. i.e.: Sholl analysis in the new figure 1h, is presented with no definition of number of cells employed and standard deviations of the replication. The "simil" Sholl analysis performed on VAMP2 is still puzzling and some explanations on the reason for the constant value of VAMP2 fluorescent signal from less than 0 to 160 µm from the cell body is to be added. How is the increased number of active synapses explained? How is this related to shorter AZ and higher number of docked vesicles?

    3. Reviewer #2 (Public review):

      Summary:

      SNX4 is thought to mediate recycling from endosomes back to the plasma membrane in cells. In this study, the authors demonstrate the increases in the amounts of transmitter release and the number of docked vesicles by combining genetics, electrophysiology and EM. They failed to find evidence for its role in synaptic vesicle cycling and endocytosis, which may be intuitively closer to the endosome function.

      Strengths:

      The electrophysiological data and EM data are in principle, convincing, though there are several issues in the study.

      Weaknesses:

      It is unclear why the increase in the amounts of transmitter release and docked vesicles happened in the SNX4 KO mice. In other words, it is unclear how the endosomal sorting proteins in the end regulate or are connected to presynaptic, particularly the active zone function.

      Comments on revisions:

      I am fine with revision in principle. the authors have addressed my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      The study aims to determine whether the endosomal protein SNX4 performs a role in neurotransmitter release and synaptic vesicle recycling. The authors exploited a newly generated conditional knockout mouse to allow them to interrogate SNX4 function. A series of basic parameters were assessed, with an observed impact on neurotransmitter release and active zone morphology. The work is interesting, however as things currently stand, the work is descriptive with little mechanistic insight. There are a number of places where some of the conclusions require further validation.

      Strengths:

      The strengths of the work are the state-of-the-art methods to monitor presynaptic function.

      Weaknesses:

      The weaknesses are the fact that the work is largely descriptive, with no mechanistic insight into the role of SNX4.

      Comments on revisions:

      The authors have addressed a couple of the more major concerns with the manuscript, however many of the original weaknesses remain. The primary weakness being the lack of mechanism. It is disappointing that real-time VAMP2 trafficking was not investigated, and the authors justification as to why the experiment was not performed was not convincing (especially since this is the approach that all other groups employ to examine SV cargo trafficking). In a number of instances "contractual constraints" are referred to as an explanation for not performing additional experiments. It was unclear whether this refers to licencing issues with the mouse line or the lack of personnel to perform the work. Regardless it still leaves this work as somewhat incomplete.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the work: "Endosomal sorting protein SNX4 limits synaptic vesicle docking and release" Josse Poppinga and collaborators addressed the synaptic function of Sortin-Nexin 4 (SNX4). Employing a newly developed in vitro KO model, with live imaging experiments, electrophysiological recordings, and ultrastructural analysis, the authors evaluate modifications in synaptic morphology and function upon loss of SNX4. The data demonstrate increased neurotransmitter release and alteration in synapse ultrastructure with a higher number of docked vesicles and shorter AZ. The evaluation of the presynaptic function of SNX4 is of relevance and tackles an open and yet unresolved question in the field of presynaptic physiology.

      Strengths:

      The sequential characterization of the cellular model is nicely conducted and the different techniques employed are appropriate for the morpho-functional analysis of the synaptic phenotype and the derived conclusions on SNX4 function at presynaptic site. The authors succeeded in presenting a novel in vitro model that resulted in chronical deletion of SNX4 in neurons. A convincing sequence of experimental techniques is applied to the model to unravel the role of SNX4, whose functions in neuronal cells and at synapses are largely unknown. The understanding of the role of endosomal sorting at the presynaptic site is relevant and of high interest in the field of synaptic physiology and in the pathophysiology of the many described synaptopathies that broadly result in loss of synaptic fidelity and quality control at release sites.

      We thank the reviewer for their positive evaluation of our manuscript.

      Weaknesses:

      The flow of the data presentation is mostly descriptive with several consistent morphological and functional modifications upon SNX loss. The paper would benefit from a wider characterization that would allow us to address the physiological roles of SNX4 at the synaptic site and speculate on the underlying molecular mechanisms. In addition, due to the described role of SNX4 in autophagy and the high interest in the regulation of synaptic autophagy in the field of synaptic physiology, an initial evaluation of the autophagy phenotype in the neuronal SNX4KO model is important, and not to be only restricted to the discussion section.

      We thank the reviewer for their suggestions and agree that broader characterization would help us speculate on the underlying mechanism. To address this, we have conducted additional independent experiments investigating the role of SNX4 in neuronal autophagy, as suggested by this reviewer. These experiments are now included in the main figures and are no longer limited to the discussion section. Please see the detailed responses to this reviewer's recommendations below.

      Reviewer #2 (Public Review):

      Summary:

      SNX4 is thought to mediate recycling from endosomes back to the plasma membrane in cells. In this study, the authors demonstrate the increases in the amounts of transmitter release and the number of docked vesicles by combining genetics, electrophysiology, and EM. They failed to find evidence for its role in synaptic vesicle cycling and endocytosis, which may be intuitively closer to the endosome function.

      Strengths:

      The electrophysiological data and EM data are in principle, convincing, though there are several issues in the study.

      We thank the reviewer for their positive evaluation of our manuscript.

      Weaknesses:

      It is unclear why the increase in the amounts of transmitter release and docked vesicles happened in the SNX4 KO mice. In other words, it is unclear how the endosomal sorting proteins in the end regulate or are connected to presynaptic, particularly the active zone function.

      We thank the reviewer for their suggestions and agree that further characterization would help to understand how endosomal sorting proteins regulate presynaptic neurotransmission. We have now added extra data on electrophysiological recordings clarifying SNX4’s role in the synapse. Please see the detailed responses to this reviewer's recommendations below.

      Reviewer #3 (Public Review):

      Summary:

      The study aims to determine whether the endosomal protein SNX4 performs a role in neurotransmitter release and synaptic vesicle recycling. The authors exploited a newly generated conditional knockout mouse to allow them to interrogate the SNX4 function. A series of basic parameters were assessed, with an observed impact on neurotransmitter release and active zone morphology. The work is interesting, however as things currently stand, the work is descriptive with little mechanistic insight. There are a number of places where the data appear to be a little preliminary, and some of the conclusions require further validation.

      Strengths:

      The strengths of the work are the state-of-the-art methods to monitor presynaptic function.

      We thank the reviewers for their positive evaluation of our manuscript.

      Weaknesses:

      The weaknesses are the fact that the work is largely descriptive, with no mechanistic insight into the role of SNX4. Further weaknesses are the absence of controls in some experiments and the design of specific experiments.

      We thank the reviewer for their suggestions and agree that addition of extra control groups and experiments would strengthen interpretation of the observed phenotype. To address this, we have now performed experiments to investigate the miniature excitatory postsynaptic currents and added extra control groups such as overexpression of SNX4 on control background. In addition, we assessed SNX4-mediated neuronal autophagy as a potential molecular mechanism by which SNX4 affects synaptic output. Please see the detailed responses to this reviewers’ recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The characterization of the neurite outgrowth presented in Figure 1 is a necessary starting point for the characterization of the model and the interpretation of the following data. Being the analysis conducted at 21 DIV, a significant portion of the neurite tree is out of the analyzed field. Adding sholl analysis will better indicate the complexity of the that appears to be influenced by SNX4 loss in the representative images shown in Figure 1f.

      We fully agree and have now performed a Sholl analysis of dendrite branches to investigate dendritic complexity. (Figure 1(i), page 2-3, line 86-88). SNX4 depletion does not affect dendrite length or dendrite branching.

      (2) Analogously, the characterization of synapse number is of relevance for the interpretation of the data. For a better flow of the data, Figure 4 might be presented as Figure 2 (without the repetition of panel h in Figure 1). An explanation of how VAMP2 puncta are processed is necessary in the method section. A double labelling with a postsynaptic marker would allow trafficking organelles to be distinguished from mature synaptic contacts. Indeed, the analysis of VAMP2 intensity along neurite in mature 21DIV neurons should reveal peaks in the intensity profile that represent synaptic contacts. For unexplained reasons, the profile is rather flat in the two experimental groups. Focusing on axonal branches will surely result in a peaked profile for VAMP2 labelling.

      We fully agree that the characterization of synapses is relevant for the interpretation of the data. We have now added a section in our Material and Methods how the VAMP2 puncta are processed (p14 line 517-520). Instead of labeling mature synapses using double labeling of VAMP2 and PSD95, we analyzed the number of active synapses in live neurons using SypHy (Fig. 3g). The reviewer is correct that the VAMP2 data presented in Fig 1I and Fig 4 is part of the same dataset and we have clarified this in the figure legend. In Fig 1I only the total number of VAMP2 puncta is plotted as a marker for synapse number, while in Fig 4 we assess VAMP2 as potential SNX4 sorting cargo (Ma et al., 2017). Because of these different aims, we prefer to keep the figures separate. The analysis of VAMP2 intensity along the distance of the soma is a Sholl analysis (Fig. 4d), represents the average VAMP2 intensity over distance from the soma of 35-41 neurons per group. In contrast to a line scan of a single neurite, this average profile lacks the peaks of individual synapses.

      (3) Miniature excitatory postsynaptic currents recordings would strengthen the synaptic characterization and complement the electrophysiological recordings shown in Figure 2. Analyzing frequency and amplitude parameters would complement the data on the number of synaptic connections defined by the pre and postsynaptic colocalization puncta as suggested above and may support the data shown in Figure 3 g that suggests a decreased number of active synapses in SNX4-KO cells.

      We fully agree that the characterization of miniature excitatory postsynaptic currents would strengthen the synaptic characterization and complement the other electrophysiological data. Therefore, we have now added additional experiments showing the mEPSCs (Fig. 2k-m, page 4) in SNX4 cKO neurons versus control. This data shows that the amplitude and frequency of spontaneous miniature EPSCs (mEPSCs) were not affected upon SNX4 depletion, consistent with a normal first evoked EPSC and RRP estimate. Furthermore, these data suggest that it is unlikely that the observed increase in neurotransmission is due to post-synaptic effects.

      (4) Recordings on the first evoked response shown in Figure 2 b and quantified in Figures c and d suggest that SNX4 overexpression per se exerts some effect on the Amplitude and the Charge of the first evoked response. This is also evident in the supplementary Figure 2 with lower frequency trains. An additional experimental group, namely control+SNX4 is needed for the correct interpretation of the observed phenotype. The possibility that SNX4 per se exerts an effect on evoked transmission could be discussed in terms of putative mechanisms and interactions.

      We thank the reviewer for their suggestion and agree that an additional experimental group (control + SNX4) would strengthen interpretation of the observed phenotype. We have now added a new experimental condition with overexpression of SNX4 on a control background (Supplementary Fig. 3, page 20). This data shows that the amplitude and charge of the first evoked response were not affected in control + SNX4 neurons compared to control, and no differences were detected in the response to the 40 Hz stimulation train (Supplementary Fig. 3a-e).  Together, these data suggest that SNX4 overexpression in itself does not affect the neurotransmission protocols studied in SNX4 cKO experiments.

      (5) To correctly interpret the SyPhy experiments and exclude an effect of SNX silencing on SV recycling, it is suggested to repeat the experiments shown in Figure 3 in the absence and in the presence of bafilomycin. Indeed, the quantifications shown in Figure 3 d and f do not represent "release fraction" as stated (lines 139/140) but they rather refer to an average difference between release fraction and recovered fraction. With the use of bafilomycin, the comparison of the deltaFmax/deltaFNH4Cl with and without bafilomycin would enable the release fraction to be correctly evaluated and compared.

      We appreciate the reviewer’s suggestion and agree on the importance of considering the impact of SV recycling when evaluating the released fraction. We agree that the presence of bafilomycin is critical to isolate the released component during stimulation. We have now rephrased this conclusion. To assess synaptic recycling in these assays, bafilomycin in not critically required and we show by multiple independent experiments, including SypHy and FM64 dye assays, that SV recycling is either not affected or the effect is too small to be detected by these methods.

      (6) In the ultrastructural analysis, additional quantifications are needed to exclude the accumulation of endosome-like structures. It is not clear if, in the evaluation of total SV number (Figure 5e), the authors counted all vesicles or vesicles < 50nm. This has to be explained and additional quantification of # of SV < 50nm and # SV > 50nm is informative, taking into account the endosomal nature of SNX4. Indeed, although the average size of SV is not changed (fig. 5 d), the density of "bigger vesicle" may result from endosomal-like structure accumulation. An additional suggested quantification is on vesicle # SV > 80nm as previously reported in the cited references dealing with endosomal proteins and presynaptic morphology.

      We fully agree that the characterization of vesicle size is important and that it was not clearly stated which vesicles were included in the total number of SV (Fig. 5e). We have now added this to the figure description. We have also added a histogram that contains the vesicle numbers of different bin sizes for SNX4 cKO synapses and control synapses (Supplementary Fig. 4, page 21) including # SVs > 80nm. (Whilst it seems that there are more “bigger” vesicles in the KO, further analysis revealed that this is mostly driven by one experiment and this effect is not consistent.)

      (7) Due to the high scientific interest in presynaptic autophagy for SV recycling and degradation, and the paucity of experimental work assessing the proteins involved, an initial evaluation of the neuronal autophagy process (by western blot analysis and immunocytochemistry) for the characterization of the model will better support the paragraph in the discussion (lines 314-322) and contribute to future work in the field. Although very rare, autophagosomes quantification at presynaptic sites can also be performed from the already acquired images. A double membrane structure with the material inside is evident in the representative control image presented!

      We appreciate the reviewer’s suggestion and agree that presynaptic autophagy is an interesting potential mechanism that would elaborate our current working model. To address the reviewers’ suggestion, we added multiple independent experiments to investigate basal autophagy markers such as ATG5 using western blot analysis, characterization of p62 levels using immunohistochemistry and performed additional morphometric analysis on the electron microscopy data (Supplementary Fig. 5). In SNX4 cKO neurons, there was no significant difference in P62 puncta numbers or P62 somatic intensity under basal conditions or after blocking autophagic P62 degradation by bafilomycin treatment, suggesting that autophagic flux remains normal. Also, no changes in total ATG5 protein levels were observed and ultrastructural analysis revealed no differences in the total number of autophagosomes. Collectively, these data indicate that SNX4 depletion does not impact the basal autophagic flux, ATG5 protein levels, or the number of autophagosomes.

      Minor points:

      (1) Dorrbaun et al. 2018 is missing from the reference list. In the legend to figure 1 there is an incorrect reference to Figure 6, rather than Figure 4.

      We have now adjusted the figure legend and added the reference (page 16, line 604).

      (2) Information on the construct employed for the rescue is missing. Is it a fluorescent tag construct? Representative images of the three autaptic neurons (control, KO, KO+SNX4) would nicely complement data presentation in Figure 2. 

      We have now elaborated on this in material and methods section (p12, line 418-421). Unfortunately, we did not obtain pictures of autaptic neurons used for electrophysiology experiments.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2d and f are somewhat inconsistent. Total charges for the 1st EPSCs differ almost 2-fold in the same condition.

      We appreciate the reviewer’s concern. The average EPSCs charge of the first evoked was 89, 122 and 57 pC for control, KO and rescued neurons respectfully. The average charge of the first pulse of 40Hz train was 41,58 and 32 pC for control, KO and rescued neurons respectfully, which is roughly 50% of the naïve response of the same cells. These trains were recorded after 2 or 3 other stimulation paradigms, which can have affected the total charge released in the 40Hz train. That said, the proportional difference between groups is high comparable, with a 37% increased average charge released in SNX4 cKO compared to control in the naïve response and 41% increased response in the first response of the 40 Hz train, and rescued cells show a 53% reduction in average released charge compared to control in the naïve response compared to a 44% reduction in the first response of the 40 Hz train. Although the absolute values differ between these readouts, we conclude that the biological comparison between groups is consistent.

      (2) Figure 2h. This type of analysis has a drawback. See Neher (2015) for the problems associated with this analysis.

      We fully agree with the reviewer’s comment. As noted in our discussion (page 9 line 285), while this analysis has its limitations, it can still provide an indication of the ready releasable pool.   

      (3) The EPSC phenotype may be due to postsynaptic effects. This should be excluded by additional experiments (mEPSC analysis) or further clarification.

      We fully agree that the characterization of miniature excitatory postsynaptic currents recording would strengthen the synaptic characterization and complement the electrophysiological recordings. Therefore, we have now added additional experiments showing the mEPSCs (Fig. 2k-m) in SNX4 cKO neurons versus control. This data shows that the amplitude and frequency of spontaneous miniature EPSCs (mEPSCs) were not affected upon SNX4 depletion, suggesting that it is unlikely that the observed increase in neurotransmission is due to post-synaptic effects.

      (4) The increased number of docked vesicles observed in EM and the increased slope (vesicle recruitment, Figure 2h) are not consistent with each other. Maybe the definition of docked vesicles is unclear in this version of the manuscript.

      As noted in our material & methods (page 15, line 547-548), SVs were defined as docked if there was no distance visible between the SV membrane and the active zone membrane. We have added the pixel size for clarification. Indeed, we do not observe an increase in release probability or first evoked response, which would correspond with an increased docked pool. However, we think that the increase in docked vesicles might contribute to an enhanced SV recruitment (see discussion).

      (5) Figure 3: Vesicle cycling was monitored in only a limited condition. It is known that there are multiple pathways of vesicle cycling. Ideally, these pathways should be dissected. At least, the authors mention the possibility that they have missed some "positive" conditions.

      We fully agree with the reviewer’s comment that vesicle recycling is complex with several parallel pathways involved. While we did not study individual endocytosis pathways, we used different assays covering various recycling pathways. The SypHy assay (Fig. 3c & f) combined with the 100 AP stimulation paradigm at room temperature predominantly addresses clathrin-mediated endocytosis. Additionally, the FM-64 dye assay at 37 degrees Celsius covers ultrafast endocytosis pathways as well as bulk endocytosis routes. Since neither assay showed major effects, we decided not to pursue further experiments focusing on different endocytosis pathways.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Since all of the work here is culture-focussed, the in vivo phenotype is not as relevant, however the in vitro properties are. The incomplete Cre-dependent removal of SNX4 is concerning (especially axonal SNX4 levels identified via immunofluorescence), however, the main concern is that there was no profiling of the other molecular changes within these cultures. This is important, since there may be considerable alterations in the expression of a number of presynaptic proteins which may explain the observed phenotypes. Ideally, these cultures could have been profiled in an unbiased manner via mass spectrometry to identify potential changes in the presynaptic proteome, or at the very least the levels of key fusion molecules would have been assessed via Western blotting.

      We thank the reviewer for their suggestion and agree that mass spectrometry would strengthen the interpretation of the observed phenotype. However, due to contractual constraints, we are unable to pursue a mass spectrometry follow-up experiment. We agree that characterizing key fusion molecules is of potential interest. Therefore, based on literature, we selected a likely candidate, VAMP2, which did not show any alterations in expression levels when knocking out SNX4. Given the previously described role of SNX4 in the degradation pathway, one would expect increased degradation of key fusion molecules if they are recycled by SNX4. Other literature indicates that reduced levels of key fusion molecules, such as synaptotagmin or SNAP-25 (Broadie et al., 1994; Washbourne et al., 2001) , do not mimic our phenotype.

      (2) The experiments reported in Figure 2, in particular those in 2c and 2d, suggest that overexpression of SNX4 has a dominant-negative effect on neurotransmitter release. This is strongly supported by the supplementary data during a stimulus train (particularly the start point of the 5 Hz train in Supplementary Figure 2). Therefore, the perceived rescue of EPSC charge in Figure 2f, 2g may be a result of SNX4 inhibiting neurotransmitter release. A determination of the impact of SNX4 overexpression (and level of overexpression) in WT neurons is essential to show that this is a bonefide rescue, rather than a direct inhibition by SNX4 overexpression.

      We thank the reviewer for their suggestion and agree that an additional experimental group (control + SNX4) would strengthen interpretation of the observed phenotype. We have now added a new experiment with an extra experimental condition with overexpression of SNX4 on a control background (Supplementary Fig. 3 page 21). This data shows that the amplitude and charge of the first evoked response were not affected in control + SNX4 neurons compared to control, and no differences were detected in the response to the 40 Hz stimulation train (Supplementary Fig. 3a-e).  Together, these data suggest that SNX4 overexpression in itself does not affect the neurotransmission protocols studied in SNX4 cKO experiments.

      (3) The experiments in Figure 3 clearly reveal a lack of effect of SNX4 depletion on synaptic vesicle endocytosis. However, the assumption that synaptic vesicle recycling is unaffected is a little premature. The fact that the second evoked SypHy peak is significantly larger than the first (Figures 3c-e) suggests that more vesicles may be recycling in KO neurons. Furthermore, the FM dye experiments do not aid interpretation, since there may be insufficient time (10 min) for new vesicles to be generated from endosomal intermediates experiments. Therefore, to confirm an absence of effect on recycling, the authors could either 1) perform the same experiment as 3c, but with 4 stimulation trains (to drive the system harder to reveal any phenotype) or 2) repeat the FM dye experiment but increase the time between loading and unloading to 30 min.

      We fully agree with the reviewers' comment that vesicle recycling is an important component to consider and is complex with several parallel pathways involved. We conducted multiple independent experiments covering the most significant recycling pathways. The SypHy assay (Fig. 3c & f) combined with the 100 AP stimulation paradigm at room temperature predominantly addresses clathrin-mediated endocytosis. Additionally, the FM-64 dye assay at 37 degrees Celsius covers ultrafast endocytosis pathways as well as bulk endocytosis routes. To further challenge the system and reveal recycling phenotypes, we included a second 100 AP stimulation in our SypHy assay. While only the increase of the second SypHy peak is significant, the absolute numbers do not differ much from the first peak (0,17 for control and 0,21 for KO second peak and 0,19 for control and 0,22 for KO first peak, Supplementary table1). We nevertheless do not see any effects on recycling after the second peak (mean decay time is 27 for control and 26 for KO Supplementary Table 1). A single 100 AP 40 Hz train depletes all the synchronous release (not shown) and most of the evoked charge (see Fig 2f), hence two of these trains with one minute recovery is already a very demanding protocol. Although increasing the time between loading and unloading to 30 minutes might uncover other recycling components, it has been shown that ultrafast endocytosis occurs within 30 seconds (Watanabe et al., 2013), suggesting that 10 minutes should provide enough time for synaptic vesicle recycling. This is also evident from the fact that we can significantly destain synapses loaded with FM dye by electrical stimulation (Fig 3j), indicating that synaptic vesicle recycling took place. Since neither assay showed major effects, we concluded that under these circumstances, synaptic recycling is not significantly affected. However, we cannot exclude the possibility that recycling deficits in SNX4 cKO neurons could be detected in other paradigms,

      (4) There is no obvious effect on VAMP2 levels or location in SNX4 KO neurons (Figure 4). However, when one considers that SNX4 is proposed to have a role in VAMP2 trafficking, it is surprising that an experiment examining the live trafficking of VAMP2-SypHy was not performed. This would have revealed activity-dependent alterations that would have been missed by simply measuring VAMP2 expression and localization, and potentially provided a molecular explanation for the enhanced neurotransmitter release during a stimulus train.

      We appreciate the reviewer’s suggestion and agree that it could be a valuable experiment However, overexpressing a VAMP2-pHluorin construct might obscure potential phenotypes related to VAMP2 trafficking. SNX4 is expected to be involved in VAMP2 recycling, even with activity-dependent changes. Mis-sorted VAMP2 would accumulate in acidic vesicles, which could be masked by the VAMP2-pHluorin construct. Similarly, mis-sorting of other SNX4 cargo, such as the transferrin receptor, has been identified through lysosomal degradation, as shown by Western blot analysis of expression levels of the endogenous protein. We did not detect any differences in endogenous levels of VAMP2 within 21 days of SNX4 deletion (Fig 4), indicating that SNX4-dependent endosome sorting is not essential for VAMP2 recycling.

      (5) The morphological data in Figure 5 report a series of small changes in docked vesicles and active zone length. In many cases, significance is obtained due to synapses being used as the experimental n, and thus inflating the statistical power. When one considers that no significant effect was observed on evoked release (apart from during a stimulus train), it suggests that the number of docked vesicles does not alter release probability in this system (which the authors point out). Instead, they suggest that an increased supply of vesicles is responsible, via increased recruitment to RRP/releasable pool (but not via increased recycling). If this is the case, it should have been reflected as an increase in the evoked SypHy response in Fig 2c,d (which is borderline significant). What may help is to determine the morphological landscape immediately after a stimulus strain, since this is the only condition where enhanced release is observed, and thus provide a morphological correlate to the physiological data.

      We fully agree with the reviewer’s suggestion that an ultrastructural characterization immediately after a stimulus train would be informative. Unfortunately, contract constraints prevent us from performing this experiment. For our ultrastructural morphological data, we treated synapses as individual experimental n since it is not possible to determine whether synapses in a micronetwork on one sapphire originate from the same neuron. We used 18 independent sapphires from 3 independent pups to ensure the technical and biological replication of our data and measuring independent neurons. We fully agree with the reviewers comment to be careful with ‘inflating the statistical power’ due to potential nesting effects when using synapses as experimental n. To mitigate the potential nesting effect of analyzing multiple synapses per neuron, the intracluster correlation (ICC) is calculated per variable and per nesting effect. If ICC was close to 0.1, indicating that a considerable portion of the total variance can be attributed to e.g. synapse or sapphire, multilevel analysis was performed to accommodate nested data (Aarts et al., 2014).

      Minor points

      (1) When a new mouse model is generated, it is usually accompanied by a thorough characterization of its properties. However, in this case, there was no information provided about the conditional SNX4 knockout mouse. This is surprising and at a minimum, the following should be provided a) the background strain, b) method of generation, c) the number of animals used to establish the colony, d) breeding strategy, e) backcrossing strategy, f) genotyping protocol.

      We apologize that a thorough characterization of our novel mouse model was lacking and therefore added this to our material & methods section (page 11, line 377-391).

      (2) There is a noticeable difference between WT and KO neurons during train stimulation in Figure 2f, however, this appears to be due to the fact that there is a far higher EPSC charge to begin with in KO neurons. Why is there such a disparity when there is no difference in response to single pulses (Figures 2b-d) or presynaptic plasticity (Figure 2e)?

      We understand the reviewer’s concern. We excluded an outlier (3x SD) in the KO dataset that drove the initial far higher EPSC charge in the graph (was already excluded for the statistics, Supplementary table 1). The average charge of the first pulse of 40Hz train is 41 pC and for KO neurons 58 pC, which did not differ significantly.  These trains of Fig. 2f were recorded after 2 or 3 other stimulation paradigms, which can have affected the total charge released in the 40Hz train. That said, the proportional difference between groups is high comparable between Fig 2b-d and 2f, with a 37% increased average charge released in SNX4 cKO compared to control in the naïve response (Fig. 2d) and 41% increased response in the first response of the 40 Hz train (Fig. 2f), and rescued cells show a 53% reduction in average released charge compared to control in the naïve response compared to a 44% reduction in the first response of the 40 Hz train. Although the absolute values differ between these readouts, we conclude that the biological comparison between groups is consistent.

      (3) Line 343-344 - "(Supplementary Figure 1a)" should be "(Figure 1a)".

      We thank the reviewer for this comment and adjusted this in the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by McKim et al seeks to provide a comprehensive description of the connectivity of neurosecretory cells (NSCs) using a high-resolution electron microscopy dataset of the fly brain and several single-cell RNA seq transcriptomic datasets from the brain and peripheral tissues of the fly. They use connectomic analyses to identify discrete functional subgroups of NSCs and describe both the broad architecture of the synaptic inputs to these subgroups as well as some of the specific inputs including from chemosensory pathways. They then demonstrate that NSCs have very few traditional presynapses consistent with their known function as providing paracrine release of neuropeptides. Acknowledging that EM datasets can't account for paracrine release, the authors use several scRNAseq datasets to explore signaling between NSCs and characterize widespread patterns of neuropeptide receptor expression across the brain and several body tissues. The thoroughness of this study allows it to largely achieve it's goal and provides a useful resource for anyone studying neurohormonal signaling.

      Strengths:

      The strengths of this study are the thorough nature of the approach and the integration of several large-scale datasets to address short-comings of individual datasets. The study also acknowledges the limitations that are inherent to studying hormonal signaling and provides interpretations within the the context of these limitations.

      Weaknesses:

      Overall, the framing of this paper needs to be shifted from statements of what was done to what was found. Each subsection, and the narrative within each, is framed on topics such as "synaptic output pathways from NSC" when there are clear and impactful findings such as "NSCs have sparse synaptic output". Framing the manuscript in this way allows the reader to identify broad takeaways that are applicable to other model system. Otherwise, the manuscript risks being encyclopedic in nature. An overall synthesis of the results would help provide the larger context within which this study falls.

      We agree with the reviewer and will replace all the subsection titles as suggested.

      The cartoon schematic in Figure 5A (which is adapted from a 2020 review) has an error. This schematic depicts uniglomerular projection neurons of the antennal lobe projecting directly to the lateral horn (without synapsing in the mushroom bodies) and multiglomerular projection neurons projecting to the mushroom bodies and then lateral horn. This should be reversed (uniglomerular PNs synapse in the calyx and then further project to the LH and multiglomerular PNs project along the mlACT directly to the LH) and is nicely depicted in a Strutz et al 2014 publication in eLife.

      We thank the reviewer for spotting this error. We will modify the schematic as suggested.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive description of the neurosecretory network in the adult Drosophila brain. They sought to assign and verify the types of 80 neurosecretory cells (NSCs) found in the publicly available FlyWire female brain connectome. They then describe the organization of synaptic inputs and outputs across NSC types and outline circuits by which olfaction may regulate NSCs, and by which Corazon-producing NSCs may regulate flight behavior. Leveraging existing transcriptomic data, they also describe the hormone and receptor expressions in the NSCs and suggest putative paracrine signaling between NSCs. Taken together, these analyses provide a framework for future experiments, which may demonstrate whether and how NSCs, and the circuits to which they belong, may shape physiological function or animal behavior.

      Strengths:

      This study uses the FlyWire female brain connectome (Dorkenwald et al. 2023) to assign putative cell types to the 80 neurosecretory cells (NSCs) based on clustering of synaptic connectivity and morphological features. The authors then verify type assignments for selected populations by matching cluster sizes to anatomical localization and cell counts using immunohistochemistry of neuropeptide expression and markers with known co-expression.

      The authors compare their findings to previous work describing the synaptic connectivity of the neurosecretory network in larval Drosophila (Huckesfeld et al., 2021), finding that there are some differences between these developmental stages. Direct comparisons between adults and larvae are made possible through direct comparison in Table 1, as well as the authors' choice to adopt similar (or equivalent) analyses and data visualizations in the present paper's figures.

      The authors extract core themes in NSC synaptic connectivity that speak to their function: different NSC types are downstream of shared presynaptic outputs, suggesting the possibility of joint or coordinated activation, depending on upstream activity. NSCs receive some but not all modalities of sensory input. NSCs have more synaptic inputs than outputs, suggesting they predominantly influence neuronal and whole-body physiology through paracrine and endocrine signaling.

      The authors outline synaptic pathways by which olfactory inputs may influence NSC activity and by which Corazon-releasing NSCs may regulate flight. These analyses provide a basis for future experiments, which may demonstrate whether and how such circuits shape physiological function or animal behavior.

      The authors extract expression patterns of neuropeptides and receptors across NSC cell types from existing transcriptomic data (Davie et al., 2018) and present the hypothesis that NSCs could be interconnected via paracrine signaling. The authors also catalog hormone receptor expression across tissues, drawing from the Fly Cell Atlas (Li et al., 2022).

      Weaknesses:

      The clustering of NSCs by their presynaptic inputs and morphological features, along with corroboration with their anatomical locations, distinguished some, but not all cell types. The authors attempt to distinguish cell types using additional methodologies: immunohistochemistry (Figure 2), retrograde trans-synaptic labeling, and characterization of dense core vesicle characteristics in the FlyWire dataset (Figure 1, Supplement 1). However, these corroborating experiments often lacked experimental replicates, were not rigorously quantified, and/or were presented as singular images from individual animals or even individual cells of interest. The assignments of DH44 and DMS types remain particularly unconvincing.

      We thank the reviewer for this comment. We would like to clarify that the images presented in Figure 2 and Figure 1 Supplement 1 are representative images based on at least 5 independent samples. We will clarify this in the figure caption and methods. The electron micrographs showing dense core vesicle (DCV) characteristics (Figure 1 Supplement E-G) are also representative images based on examination of multiple neurons. However, we agree with the reviewer that a rigorous quantification would be useful to showcase the differences between DCVs from NSC subtypes. Therefore, we have now performed a quantitative analysis of the DCVs in putative m-NSC<sup>DH44</sup> (n=6), putative m-NSC<sup>DMS</sup> (n=6) and descending neurons (n=4) known to express DMS. For consistency, we examined the cross section of each cell where the diameter of nuclei was the largest. We quantified the mean gray value of at least 50 DCV per cell. Our analysis shows that mean gray values of putative m-NSC<sup>DMS</sup> and DMS descending neurons are not significantly different, whereas the mean gray values of m-NSC<sup>DH44</sup> are significantly larger. This analysis is in agreement with our initial conclusion.

      Author response image 1.

      The authors present connectivity diagrams for visualization of putative paracrine signaling between NSCs based on their peptide and receptor expression patterns. These transcriptomic data alone are inadequate for drawing these conclusions, and these connectivity diagrams are untested hypotheses rather than results. The authors do discuss this in the Discussion section.

      We fully agree with the reviewer and will further elaborate on the limitations of our approach in the revised manuscript. However, there is a very high-likelihood that a given NSC subtype can signal to another NSC subtype using a neuropeptide if its receptor is expressed in the target NSC. This is due to the fact that all NSC axons are part of the same nerve bundle (nervi corpora cardiaca) which exits the brain. The axons of different NSCs form release sites that are extremely close to each other. Neuropeptides from these release sites can easily diffuse via the hemolymph to peripheral tissues that (e.g. fat body and ovaries) that are much further away from the release sites on neighboring NSCs. We believe that neuropeptide receptors are expressed in NSCs near these release sites where they can receive inputs not just from the adjacent NSCs but also from other sources such as the gut enteroendocrine cells. Hence, neuropeptide diffusion is not a limiting factor preventing paracrine signaling between NSCs and receptor expression is a good indicator for putative paracrine signaling.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents an ambitious and comprehensive synaptic connectome of neurosecretory cells (NSC) in the Drosophila brain, which highlights the neural circuits underlying hormonal regulation of physiology and behaviour. The authors use EM-based connectomics, retrograde tracing, and previously characterised single-cell transcriptomic data. The goal was to map the inputs to and outputs from NSCs, revealing novel interactions between sensory, motor, and neurosecretory systems. The results are of great value for the field of neuroendocrinology, with implications for understanding how hormonal signals integrate with brain function to coordinate physiology.

      The manuscript is well-written and provides novel insights into the neurosecretory connectome in the adult Drosophila brain. Some, additional behavioural experiments will significantly strengthen the conclusions.

      Strengths:

      (1) Rigorous anatomical analysis

      (2) Novel insights on the wiring logic of the neurosecretory cells.

      Weaknesses:

      (1) Functional validation of findings would greatly improve the manuscript.

      We agree with this reviewer that assessing the functional output from NSCs would improve the manuscript. Given that we currently lack genetic tools to measure hormone levels and that behaviors and physiology are modulated by NSCs on slow timescales, it is difficult to assess the immediate functional impact of the sensory inputs to NSC using approaches such as optogenetics. However, since l-NSC<sup>CRZ</sup> are the only known cell type that provide output to descending neurons, we will functionally test this output pathway using different behavioral assays recommended by this reviewer.

    1. eLife Assessment

      This manuscript provides fundamental studies to help us better understand the effects of mutations in the presenilin-1 (PSEN1) gene on proteolytic processing of the amyloid precursor protein (APP). The authors provide compelling evidence using mutations in PSEN to understand what drives alternative substrate turnover with conclusive data and rigorous analysis. This deep mechanistic study provides a framework towards the development of small molecule inhibitors to treat Alzheimer's disease.

    2. Reviewer #1 (Public review):

      Summary:

      Arafi et al. present results of studies designed to better understand the effects of mutations in the presenilin-1 (PSEN1) gene on proteolytic processing of the amyloid precursor protein (APP). This is important because APP processing can result in the production of the amyloid β-protein (Aβ), a key pathologic protein in Alzheimer's disease (AD). Aβ exists in various forms that differ in amino acid sequence and assembly state. The predominant forms of Aβ are Aβ40 and Aβ42, which are 40 and 42 amino acids in length, respectively. Shorter and longer forms derive from processive proteolysis of the Aβ region of APP by the heterotetramer β-secretase, within which presenilin 1 possesses the active site of the enzyme. Each form may become toxic if it assembles into non-natively folded, oligomeric, or fibrillar structures. A deep mechanistic understanding of enzyme-substrate interactions is a first step toward the design and successful use of small-molecule therapeutics for AD.

      The key finding of Arafi et al. is that three PSEN mutations display unusual profiles of effects on Aβ production that have novel implications for the stalled E-S complex hypothesis. PSEN1 F386S is unique in that initial ε cleavage is not reduced compared with WT PSEN1; only certain trimming steps are deficient, results consistent with FLIM experiments that reveal stabilized E-S complexes only in Aβ-rich regions in the cell. In contrast, PSEN1 A431E and A434T display very little ε cleavage and therefore very little overall Aβ production, suggesting a limited role of Aβ in the pathogenesis of these two mutants and pointing to stalled E-S complexes as the common factor. For the biochemist, this may not be surprising, but in the context of understanding and treating AD, it is immense because it shifts the paradigm from targeting the results of γ-secretase action, viz., Aβ oligomers and fibrils, to targeting initial Aβ production at the molecular level. It is the equivalent of taking cancer treatment from simple removal of tumorous tissue to prevention of tumor formation and growth. Arafi et al. have provided us with a blueprint for the design of small-molecule inhibitors of γ-secretase. The significance of this achievement cannot be overstated.

      Strengths and weaknesses:

      The comprehensiveness and rigor of the study are notable. Rarely have I reviewed a manuscript reporting the results of so many orthogonal experiments, all of which support the authors' hypotheses, and of so many excellent controls. In addition, as found in clinical trial reports, the limitations of the study were discussed explicitly. None of these significantly affected the conclusions of the study.

    3. Reviewer #2 (Public review):

      Summary:

      The work by Arafi et al. shows the effect of Familial Alzheimer's Disease presenilin-1 mutants on endoproteinase and carboxylase activity. They have elegantly demonstrated how some mutants alter each step of processing. Together with FLIM experiments, this study provides additional evidence to support their 'stalled complex hypotheses'.

      Strengths:

      This is a beautiful biochemical work. The approach is comprehensive.

      Weaknesses:

      (1) It appears that the purified g-secretase complex generates the same amount of Ab40 and Ab42, which is quite different in cellular and biochemical studies. Is there any explanation for this?

      (2) It has been reported the Ab production lines from Ab49 and Ab48 can be crossed with various combinations (PMID: 23291095 and PMID: 38843321). How does the production line crossing impact the interpretation of this work?

      (3) In Figure 5, did the authors look at the protein levels of PS1 mutations and C99-720, as well as secreted Ab species? Do the different amounts of PS1 full-length and PS1-NTF/CTF influence FILM results?

      (4) It is interesting that both Ab40 and Ab42 Elisa kits detect Ab43. Have the authors tested other kits in the market? It might change the interpretation of some published work.

    4. Author response:

      Reviewer 2:

      (1) It appears that the purified γ-secretase complex generates the same amount of Aβ40 and Aβ42, which is quite different in cellular and biochemical studies. Is there any explanation for this?

      Roughly equal production of Aβ40 and Aβ42 is a phenomenon seen with purified enzyme assays, and the reason for this has not been identified. However, we suggest that what is meaningful in our studies is the relative difference between the effects of FAD-mutant vs. WT PSEN1 on each proteolytic processing step. All FAD mutations are deficient in multiple cleavage steps in γ-secretase processing of APP substrate, and these deficiencies correlate with stabilization of E-S complexes.

      (2) It has been reported the Aβ production lines from Aβ49 and Aβ48 can be crossed with various combinations (PMID: 23291095 and PMID: 38843321). How does the production line crossing impact the interpretation of this work?

      In the cited reports, such crossover was observed when using synthetic Aβ intermediates as substrate. In PMID 2391095 (Okochi M et al, Cell Rep, 2013), Aβ43 is primarily converted to Aβ40, but also to some extent to Aβ38. In PMID: 38843321 (Guo X et al, Science, 2024), Aβ48 is ultimately converted to Aβ42, but also to a minor degree to Aβ40. We have likewise reported such product line “crossover” with synthetic Aβ intermediates (PMID: 25239621; Fernandez MA et al, JBC, 2014). However, when using APP C99-based substrate, we did not detect any noncanonical tri- and tetrapeptide co-products of Aβ trimming events in the LC-MS/MS analyses (PMID: 33450230; Devkota S et al, JBC, 2021). In the original report on identification of the small peptide coproducts for C99 processing by γ-secretase using LC-MS/MS (PMID: 19828817; Takami M et al, J Neurosci, 2009), only very low levels of noncanonical peptides were observed. In the present study, we did not search for such noncanonical trimming coproducts, so we cannot rule out some degree of product line crossover.

      (3) In Figure 5, did the authors look at the protein levels of PS1 mutations and C99-720, as well as secreted Aβ species? Do the different amounts of PS1 full-length and PS1-NTF/CTF influence FILM results?

      This is a good question. Our preliminary investigation by Western Blot shows no correlation between C99 and PSEN1 expressions and FLIM results, but we will fully address the concern in our point-by-point responses submitted with a revised manuscript. 

      (4) It is interesting that both Aβ40 and Aβ42 Elisa kits detect Aβ43. Have the authors tested other kits in the market? It might change the interpretation of some published work.

      We have not tested other ELISA kits. In light of our findings, it would be a good idea for other investigators to test whatever ELISAs they use for specificity vis-à-vis Aβ43.

    1. eLife Assessment

      This valuable study provides a novel method to detect sleep cycles based on variations in the slope of the power spectrum from electroencephalography signals. The method, dispensing with time-consuming and potentially subjective manual identification of sleep cycles, is supported by solid evidence and analyses. This study will be of interest to researchers and clinicians working on sleep and brain dynamics.

    2. Reviewer #1 (Public review):

      In this study, Rosenblum et al introduce a novel and automatic way of calculating sleep cycles from human EEG. Previous results have shown that the slope of the non-oscillatory component of the power spectrum (called the aperiodic or fractal component) changes with sleep stage. Building on this, the authors present an algorithm that extracts the continuous-time fluctuations in the fractal slope and propose that peaks in this variable can be used to identify sleep cycle limits. Cycles defined in this way are termed "fractal cycles". The main focus of the article is a comparison of "fractal" and "classical" (ie defined manually based on the hypnogram) sleep cycles in numerous datasets.

      The manuscript amply illustrates through examples the strong overlap between fractal and classical cycle identification. Accordingly, a high percentage (81%) can be matched one-to-one between methods and sleep cycle duration is well correlated (around R = 0.5). Moreover, the methods track certain global changes in sleep structure in different populations: shorter cycles in children and longer cycles in patients medicated with REM-suppressing anti-depressants. Finally, a major strength of the results is that they show similar agreement between fractal and classical sleep cycle length in 5 different data sets, showing that it is robust to changes in recording settings and methods.

      The match between fractal and classical cycles is not one-to-one. For example, the fractal method identifies a correlation between age and cycle duration in adults that is not apparent with the classical method.<br /> The difference between the fractal and classical methods appear to be linked to the uncertain definition of sleep cycles since they are tied to when exactly the cycle begins/ends and whether or not to count cycles during fractured sleep architecture at sleep onset. Moreover, the discrepancies between the two are on the order of that found between classical cycles defined manually or via an automatic algorithm.

      Overall the fractal cycle is an attractive method to study sleep architecture since it dispenses with time-consuming and potentially subjective manual identification of sleep cycles. However, given its difference with the classical method, it is unlikely that fractal scoring will be able to replace classical scoring directly. By providing a complementary quantification, it will likely contribute to refining the definition of sleep cycles that is currently ambiguous in certain cases. Moreover, it has the potential to be applied on animal studies which rarely deal with sleep cycle structure.

    3. Reviewer #2 (Public review):

      Summary:

      This study focused on using strictly the slope of the power spectral density (PSD) to perform automated sleep scoring and evaluation of the durations of sleep cycles. The method appears to work well because the slope of the PSD is highest during slow-wave sleep, and lowest during waking and REM sleep. Therefore, when smoothed and analyzed across time, there are cyclical variations in the slope of the PSD, fit using an IRASA (Irregularly resampled auto-spectral analysis) algorithm proposed by Wen & Liu (2016).

      Strengths:

      The main novelty of the study is that the non-fractal (oscillatory) components of the PSD that are more typically used during sleep scoring can be essentially ignored because the key information is already contained within the fractal (slope) component. The authors show that for the most part, results are fairly consistent between this and conventional sleep scoring, but in some cases show disagreements that may be scientifically interesting.

      Weaknesses:

      The previous weaknesses were well-addressed by the authors in the revised manuscript. I will note that from the fractal cycle perspective, waking and REM sleep are not very dissimilar. Combining these states underlies some of the key results of this study.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      Weaknesses:

      The match between fractal and classical cycles is not one-to-one. For example, the fractal method identifies a correlation between age and cycle duration in adults that is not apparent with the classical method. This raises the question as to whether differences are due to one method being more reliable than another or whether they are also identifying different underlying biological differences. It is not clear for example whether the agreement between the two methods is better or worse than between two human scorers, which generally serve as a gold standard to validate novel methods. The authors provide some insight into differences between the methods that could account for differences in results. However, given that the fractal method is automatic it would be important to clearly identify criteria for recordings in which it will produce similar results to the classical method.

      We thank the reviewer for the insightful suggestions. In the revised Manuscript, we have added a number of additional analyses that provide a quantitative comparison between the classical and fractal cycle approaches aiming to identify the source of the discrepancies between classical and fractal cycle durations. Likewise, we assessed the intra-fractal and intra-classical method reliability.

      Reviewer 2:

      One weakness of the study, from my perspective, was that the IRASA fits to the data (e.g. the PSD, such as in Figure 1B), were not illustrated. One cannot get a sense of whether or not the algorithm is based entirely on the fractal component or whether the oscillatory component of the PSD also influences the slope calculations. This should be better illustrated, but I assume the fits are quite good.

      Thank you for this suggestion. In the revised Manuscript, we have added a new figure (Fig.S1 E, Supplementary Material 2), illustrating the goodness of fit of the data as assessed by the IRASA method.

      The cycles detected using IRASA are called fractal cycles. I appreciate the use of a simple term for this, but I am also concerned whether it could be potentially misleading? The term suggests there is something fractal about the cycle, whereas it's really just that the fractal component of the PSD is used to detect the cycle. A more appropriate term could be "fractal-detected cycles" or "fractal-based cycle" perhaps?

      We agree that these cycles are not fractal per se. In the Introduction, when we mention them for the first time, we name them “fractal activity-based cycles of sleep” and immediately after that add “or fractal cycles for short”. In the revised version, we renewed this abbreviation with each new major section and in Abstract. Nevertheless, given that the term “fractal cycles” is used 88 times, after those “reminders”, we used the short name again to facilitate readability. We hope that this will highlight that the cycles are not fractal per se and thus reduce the possible confusion while keeping the manuscript short.

      The study performs various comparisons of the durations of sleep cycles evaluated by the IRASA-based algorithm vs. conventional sleep scoring. One concern I had was that it appears cycles were simply identified by their order (first, second, etc.) but were not otherwise matched. This is problematic because, as evident from examples such as Figure 3B, sometimes one cycle conventionally scored is matched onto two fractal-based cycles. In the case of the Figure 3B example, it would be more appropriate to compare the duration of conventional cycle 5 vs. fractal cycle 7, rather than 5 vs. 5, as it appears is currently being performed.

      In cases where the number of fractal cycles differed from the number of classical cycles (from 34 to 55% in different datasets as in the case of Fig.3B), we did not perform one-to-one matching of cycles. Instead, we averaged the duration of the fractal and classical cycles over each participant and only then correlated between them (Fig.2C). For a subset of the participants (45 – 66% of the participants in different datasets) with a one-to-one match between the fractal and classical cycles, we performed an additional correlation without averaging, i.e., we correlated the durations of individual fractal and classical cycles (Fig.4S of Supplementary Material 2). This is stated in the Methods, section Statistical analysis, paragraph 2.

      There are a few statements in the discussion that I felt were either not well-supported. L629: about the "little biological foundation" of categorical definitions, e.g. for REM sleep or wake? I cannot agree with this statement as written. Also about "the gradual nature of typical biological processes". Surely the action potential is not gradual and there are many other examples of all-or-none biological events.

      In the revised Manuscript, we have removed these statements from both Introduction and Discussion.

      The authors appear to acknowledge a key point, which is that their methods do not discriminate between awake and REM periods. Thus their algorithm essentially detected cycles of slow-wave sleep alternating with wake/REM. Judging by the examples provided this appears to account for both the correspondence between fractal-based and conventional cycles, as well as their disagreements during the early part of the sleep cycle. While this point is acknowledged in the discussion section around L686. I am surprised that the authors then argue against this correspondence on L695. I did not find the "not-a-number" controls to be convincing. No examples were provided of such cycles, and it's hard to understand how positive z-values of the slopes are possible without the presence of some wake unless N1 stages are sufficient to provide a detected cycle (in which case, then the argument still holds except that its alterations between slow-wave sleep and N1 that could be what drives the detection).

      In the revised Manuscript, we have removed the “NaN analysis” from both Results and Discussion. We have replaced it with the correlation between the difference between the durations of the classical and fractal cycles and proportion of wake after sleep onset. The finding is as follows:

      “A larger difference between the durations of the classical and fractal cycles was associated with a higher proportion of wake after sleep onset in 3/5 datasets as well as in the merged dataset (Supplementary Material 2, Table S10).” Results, section “Fractal cycles and wake after sleep onset”, last two sentences. This is also discussed in Discussion, section “Fractal cycles and age”, paragraph 1, last sentence. 

      To me, it seems important to make clear whether the paper is proposing a different definition of cycles that could be easily detected without considering fractals or spectral slopes, but simply adjusting what one calls the onset/offset of a cycle, or whether there is something fundamentally important about measuring the PSD slope. The paper seems to be suggesting the latter but my sense from the results is that it's rather the former.

      Thank you for this important comment. Overall, our paper suggests that the fractal approach might reflect the cycling nature of sleep in a more precise and sensitive way than classical hypnograms. Importantly, neither fractal nor classical methods can shed light on the mechanism underlying sleep cycle generation due to their correlational approach. Despite this, the advantages of fractal over classical methods mentioned in our Manuscript are as follows:

      (1) Fractal cycles are based on a real-valued metric with known neurophysiological functional significance, which introduces a biological foundation and a more gradual impression of nocturnal changes compared to the abrupt changes that are inherent to hypnograms that use a rather arbitrary assigned categorical value (e.g., wake=0, REM=-1, N1=-2, N2=-3 and SWS=-4, Fig.2 A).

      (2) Fractal cycle computation is automatic and thus objective, whereas classical sleep cycle detection is usually based on the visual inspection of hypnograms, which is time-consuming, subjective and error-prone. Few automatic algorithms are available for sleep cycle detection, which only moderately correlated with classical cycles detected by human raters (r’s = 0.3 – 0.7 in different datasets here).

      (3) Defining the precise end of a classical sleep cycle with skipped REM sleep that is common in children, adolescents and young adults using a hypnogram is often difficult and arbitrary.   The fractal cycle algorithm could detect such cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower.

      (4) The fractal analysis showed a stronger effect size, higher F-value and R-squared than the classical analysis for the cycle duration comparison in children and adolescents vs young adults. The first and second fractal cycles were significantly shorter in the pediatric compared to the adult group, whereas the classical approach could not detect this difference.

      (5) Fractal – but not classical – cycle durations correlated with the age of adult participants.

      These bullets are now summarized in Table 5 that has been added to the Discussion of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      The authors have added a lot of quantifications to provide a more complete comparison of classical and fractal cycles that address the points I raised.

      Regarding, the question of skipped REM cycles: I am not sure the comparison of skipped cycle accuracies between fractal and manual methods makes sense. To make a fair comparison fractal and 2nd scorer classifications should be compared to the same baseline dataset which doesn't seem to be the case since the number of skipped cycles is not the same. Moreover, it's not indicated whether the fractal method identifies any false positive skipped cycles.

      Thank you for this comment. In the revised Manuscript, we have reported the number of false positive skipped cycles identified by the fractal algorithm. Likewise, we have added the comparison between the fractal algorithm and the second scorer detection of cycles with skipped REM sleep (Results, the section “Skipped cycles”, last paragraph). The text has been revised as follows:

      “Visual inspection of the hypnograms from Datasets 1 – 6 was performed by two independent researchers. Scorer 1 and Scorer 2 detected that out of 226 first sleep cycles 58 (26%) and 64 (28%), respectively, lacked REM episodes. The agreement on the presence of skipped cycles between two human raters equaled 91% (58 cycles detected by both raters out of 64 cycles detected by either one or two scorers). The fractal cycle algorithm detected skipped cycles in 57 out of 58 (98%) cases detected by Scorer 1 with one false positive (which, however, was tagged as a skipped cycle by Scorer2), and in 58 out of 64 (91%) cases detected by Scorer 2 with no false positives.”

      Minor points

      I suggest reporting the values of inter-method / inter-scorer correlations with the classical method in the main text since otherwise interpreting the value for fractal vs classical is impossible.

      Thank you for this comment. In the revised Manuscript, we have moved this section to the main text (Table 3).

      Table 5 + text of discussion: cycle identification based on hypnograms is claimed to be. "based on arbitrary assigned categorical values" the categories are not arbitrary since they correspond to well-validate sleep states, only the number associated it and this does not seem to be very important since it's only for visualization purposes.

      Thank you for this comment. In the revised Manuscript, we have removed the phrase “arbitrary assigned“.

    1. eLife Assessment

      This important study investigates how working memory load influences the Stroop effect from a temporal dynamics perspective. Convincing evidence is provided that the working memory load influences the Stroop effect in the late-stage stimulus-response mapping instead of the early sensory stage. This study will be of interest to both neuroscientists and psychologists who work on cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates an intriguing question in cognitive control from a temporal dynamics perspective: why does concurrent verbal working memory load eliminate the color-word Stroop effect? Through a series of thorough data analyses, the authors propose that verbal working memory load occupies the stimulus-response mapping resources represented by theta-band activity, thereby disrupting the mapping process for task-irrelevant distractors. This reduces the response tendency to the distractors, ultimately leading to the elimination of the Stroop effect.

      Strengths:

      The behavioral and neural evidence presented in the manuscript is solid, and the findings have valuable theoretical implications for research on Stroop conflict processing.

      Comments on revisions:

      The authors have addressed all concerns

    3. Reviewer #2 (Public review):

      Summary

      Li et al. explored which stage of Stroop conflict processing was influenced by working memory loads. Participants completed a single task (Stroop task) and a dual task (the Sternberg working memory task combined with the Stroop task) while their EEG data was recorded. They adopted the event-related potential (ERP), and multivariate pattern analyses (MVPA) to investigate the interaction effect of task (single/dual) and congruency (congruent/incongruent). The results showed that the interaction effect was significant on the sustained potential (SP; 650-950 ms), the late theta (740-820 ms), and beta (920-1040 ms) power but not significant on the early P1 potential (110-150 ms). They used the representational similarity analyses (RSA) method to explore the correlation between behavioral and neural data, and the results revealed a significant contribution of late theta activity.

      Strength

      The experiment is well designed.<br /> The data were analyzed in depth from both time and frequency domain perspectives by combining several methods.

      Comments on revisions:

      All my concerns have been properly addressed, no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Comment 1: In the Results section, the rationale behind selecting the beta band for the central (C3, CP3, Cz, CP4, C4) regions and the theta band for the fronto-central (Fz, FCz, Cz) regions is not clearly explained in the main text. This information is only mentioned in the figure captions. Additionally, why was the beta band chosen for the S-ROI central region and the theta band for the S-ROI fronto-central region? Was this choice influenced by the MVPA results?

      We thank the reviewer for the question regarding the rationale for the S-ROI selection in our study. The beta band was chosen for the central region due to its established relevance in motor control (Engel & Fries, 2010), movement planning (Little et al., 2019) and motor inhibition (Duque et al., 2017). The fronto-central theta band (or frontal midline theta) was a widely recognized indicator in cognitive control research (Cavanagh & Frank, 2014), associated with conflict detection and resolution processes. Moreover, recent empirical evidence suggested that the fronto-central theta reflected the coordination and integration between stimuli and responses (Senoussi et al., 2022). Although we have described the cognitive processes linked to these different frequencies in the introduction and discussion sections, along with the potential patterns of results observed in Stroop-related studies, we did not specify the involved cortical areas. Therefore, we have specified these areas in the introduction to enhance the clarity of the revised version (in the fourth paragraph of the Introduction section).

      Regarding whether the selection of S-ROIs was influenced by the MVPA results, we would like to clarify here that we selected the S-ROIs based on prior research and then conducted the decoding analysis. Specifically, we first extracted the data representing different frequency indicators (three F-ROIs and three S-ROIs) as features, followed by decoding to obtain the MVPA results. Subsequently, the time-frequency analysis, combined with the specific time windows during which each frequency was decoded, provided detailed interaction patterns among the variables for each indicator. The specifics of feature selection are described in the revised version (in the first paragraph of the Multivariate Pattern Analysis section).

      Comment 2: In the Data Analysis section, line 424 states: “Only trials that were correct in both the memory task and the Stroop task were included in all subsequent analyses. In addition, trials in which response times (RTs) deviated by more than three standard deviations from the condition mean were excluded from behavioral analyses.” The percentage of excluded trials should be reported. Also, for the EEG-related analyses, were the same trials excluded, or were different criteria applied?

      We thank the reviewer for this suggestion. Beyond the behavioral exclusion criteria, trials with EEG artifacts were also excluded from the data for the EEG-related analyses. We have now reported the percentage of excluded trials for both behavioral and EEG data analyses in the revised version (in the second paragraph of the EEG Recording and Preprocessing section and the first paragraph of the Behavioral Analysis section).

      Comment 3: In the Methods section, line 493 mentions: “A 400-200 ms pre-stimulus time window was selected as the baseline time window.” What is the justification in the literature for choosing the 400-200 ms pre-stimulus window as the baseline? Why was the 200-0 ms pre-stimulus period not considered?

      We thank the reviewer for this question and would like to provide the following justification. First, although a baseline ending at 0 ms is common in ERP analyses, it may not be suitable for time-frequency analysis. Due to the inherent temporal smoothing characteristic of wavelet convolution in time-frequency decomposition, task-related early activities can leak into the pre-stimulus period (before 0 ms) (Cohen, 2014). This means that extending the baseline to 0 ms will include some post-stimulus activity in the baseline window, thereby increasing baseline power and compromising the accuracy of the results. Second, an ideal baseline duration is recommended to be around 10-20% of the entire trial of interest (Morales & Bowers, 2022). In our study, the epoch duration was 2000 ms, making 200-400 ms an appropriate baseline length. Third, given that the minimum duration of the fixation point before the stimulus in our experiment was 400 ms, we chose the 400 ms before the stimulus as the baseline point to ensure its purity. In summary, considering edge effects, duration requirements, and the need to exclude other influences, we selected a baseline correction window of -400 to -200 ms. To enhance the clarity of the revised version, we have provided the rationale for the selected time windows along with relevant references (in the first paragraph of the Time-frequency analysis section).

      Comment 4: Is the primary innovation of this study limited to the methodology, such as employing MVPA and RSA to establish the relationship between late theta activity and behavior?

      We thank the reviewer for this insightful question and would like to clarify that our research extends beyond mere methodological innovation; rather, it utilized new methods to explore novel theoretical perspectives. Specifically, our research presents three levels of innovation: methodological, empirical, and theoretical. First, methodologically, MVPA overcame the drawbacks of traditional EEG analyses based on specific averaged voltage intensities, providing new perspectives on how the brain dynamically encoded particular neural representations over time. Furthermore, RSA aimed to identify which indicators among the decoded were directly related to behavioral representation patterns. Second, in terms of empirical results, using these two methods, we have identified for the first time three EEG markers that modulate the Stroop effect under verbal working memory load: SP, late theta, and beta, with late theta being directly linked to the elimination of the behavioral Stroop effect. Lastly, from a theoretical perspective, we proposed the novel idea that working memory played a crucial role in the late stages of conflict processing, specifically in the stimulus-response mapping stage (the specific theoretical contributions are detailed in the second-to-last paragraph of the Discussion section).

      Comment 5: On page 14, lines 280-287, the authors discuss a specific pattern observed in the alpha band. However, the manuscript does not provide the corresponding results to substantiate this discussion. It is recommended to include these results as supplementary material.

      We thank the reviewer for this suggestion. We added a new figure along with the corresponding statistical results that displayed the specific result patterns for the alpha band (Supplementary Figure 1).

      Comment 6: On page 16, lines 323-328, the authors provide a generalized explanation of the findings. According to load theory, stimuli compete for resources only when represented in the same form. Since the pre-memorized Chinese characters are represented semantically in working memory, this explanation lacks a critical premise: that semantic-response mapping is also represented semantically during processing.

      We thank the reviewer for this insightful suggestion. We fully agree with the reviewer’s perspective. As stated in our revised version, load theory suggests that cognitive resources are limited and dependent on a specific type (in the second paragraph of the Discussion section). The previously memorized Chinese characters are stored in working memory in the form of semantic representations; meanwhile the stimulus-response mapping should also be represented semantically, leading to resource occupancy. We have included this logical premise in the revised version (in the third-to-last paragraph of the Discussion section).

      Comment 7: The classic Stroop task includes both a manual and a vocal version. Since stimulus-response mapping in the vocal version is more automatic than in the manual version, it is unclear whether the findings of this study would generalize to the impact of working memory load on the Stroop effect in the vocal version.

      We fully agree with the reviewer’s point that the verbal version of the Stroop task differs from the manual version in terms of the degree of automation in the stimulus-response mapping. Specifically, the verbal version relies on mappings that are established through daily language use, while the manual version involves arbitrary mappings created in the laboratory. Therefore, the stimulus-response mapping in the verbal response version is more automated and less likely to be suppressed. However, our previous research indicated that the degree of automation in the stimulus-response mapping was influenced by practice (Chen et al., 2013). After approximately 128 practice trials, semantic conflict almost disappears, suggesting that the level of automation in stimulus-response mapping for the verbal Stroop task is comparable to that of the manual version (Chen et al., 2010). Given that participants in our study completed 144 practice trials (in the Procedure section), we believe these findings can be generalized to the verbal version.

      Comment 8: While the discussion section provides a comprehensive analysis of the study’s results, the authors could further elaborate on the theoretical and practical contributions of this work.

      We thank the reviewer for the constructive suggestions. We recognize that the theoretical and practical contributions of the study were not thoroughly elaborated in the original manuscript. Therefore, we have now provided a more detailed discussion. Specifically, the theoretical contributions focus on advancing load theory and highlighting the critical role of working memory in conflict processing. The practical contributions emphasize the application of load theory and the development of intervention strategies for enhancing inhibitory control. A more detailed discussion can be found in the revised version (in the second-to-last paragraph of the Discussion section).

      Reviewer #2 (Public review):

      Comment 1: As the researchers mentioned, a previous study reported a diminished Stroop effect with concurrent working memory tasks to memorize meaningless visual shapes rather than memorize Chinese characters as in the study. My main concern is that lower-level graphic processing when memorizing visual shapes also influences the Stroop effect. The stage of Stroop conflict processing affected by the working memory load may depend on the specific content of the concurrent working memory task. If that’s the case, I sense that the generalization of this finding may be limited.

      We thank the reviewer for this insightful concern. As mentioned in the manuscript, this may be attributed to the inherent characteristics of Chinese characters. In contrast to English words, the processing of Chinese characters relies more on graphemic encoding and memory (Chen, 1993). Therefore, the processing of line patterns essentially occupies some of the resources needed for character processing, which aligns with our study’s hypothesis based on dimensional overlap. Additionally, regarding the results, even though the previous study presents lower-level line patterns, the results still showed that the working memory load modulated the later theta band. We hypothesize that, regardless of the specific content of the pre-presented working memory load, once the stimulus disappears from view, these loads are maintained as representations in the working memory platform. Therefore, they do not influence early perceptual processing, and resource competition only occurs once the distractors reach the working memory platform. Lastly, previous study has shown that spatial loads, which do not overlap with either the target or distractor dimensions, do not influence conflict effect (Zhao et al., 2010). Taken together, we believe that regardless of the specific content of the concurrent working memory tasks, as long as they occupy resources related to irrelevant stimulus dimensions, they can influence the late-stage processing of conflict effect. Perhaps our original manuscript did not convey this clearly, so we have rephrased it in a more straightforward manner (in the second paragraph of the Discussion section).

      Comment 2: The P1 and N450 components are sensitive to congruency in previous studies as mentioned by the researchers, but the results in the present study did not replicate them. This raised concerns about data quality and needs to be explained.

      We thank the reviewer for this insightful concern. For P1, we aimed to convey that the early perceptual processing represented by P1 is part of the conflict processing process. Therefore, we included it in our analysis. Additionally, as mentioned in the discussion, most studies find P1 to be insensitive to congruency. However, we inappropriately cited a study in the introduction that suggested P1 shows differences in congruency, which is among the few studies that hold this perspective. To prevent confusion for readers, we have removed this citation from the introduction.

      As for N450, most studies have indeed found it to be influenced by congruency. In our manuscript, we did not observe a congruency effect at our chosen electrodes and time window. However, significant congruency effects were detected at other central-parietal electrodes (CP3, CP4, P5, P6) during the 350-500 ms interval. The interaction between task type and consistency remained non-significant, consistent with previous results. Furthermore, with respect to the location of the electrodes chosen, existing studies on N450 vary widely, including central-parietal electrodes and frontal-central electrodes (for a review, see Heidlmayr et al., 2020). We speculate that this phenomenon may be related to the extent of practice. With fewer total trials, the task may involve more stimulus conflicts, engaging more frontal brain areas. On the other hand, with more total trials, the task may involve more response conflicts, engaging more central-parietal brain areas (Chen et al., 2013; van Veen & Carter, 2005). Due to the extensive practice required in our study, we identified a congruency N450 effect in the central-parietal region. We apologize for not thoroughly exploring other potential electrodes in the previous manuscript, and we have revised the results and interpretations regarding N450 accordingly in the revised version (in the N450 section of the ERP results and the third paragraph of the Discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: In the Introduction, line 108 states: “Second, alpha oscillations (8-13 Hz) can serve as a neural inverse index of mental activity or alertness, while a decrease in alpha power reflects increased alertness or enhanced attentional inhibition of distractors (Arakaki et al., 2022; Tafuro et al., 2019; Zhou et al., 2023; Zhu et al., 2023).” Please clarify which specific psychological process related to conflict processing is reflected by alpha oscillations.

      We appreciate your suggestion and we have clearly highlighted the role of alpha oscillations in attentional engagement during conflict processing in the revised version (in the third-to-last paragraph of the introduction).

      Comment 2: In Figures 3C and 3E, a space is needed between “amplitude” and the preceding parenthesis. Similar adjustments are required in Figures 4A, 4B, 4C, 5C, and 6C. Additionally, in Figures 3B and 3D, a space should be added between the numbers and “ms.” This issue also appears in Figure 8. Please review all figures for these formatting inconsistencies.

      We apologize for the inconsistency in formatting and have corrected them throughout the revised version.

      Comment 3: There are some clerical errors in the manuscript that need correction. For instance, on page 19, line 403: “Participants were asked to answer by pressing one of two response buttons (“S with the left ring finger and “L” with the left ring finger).” This should be corrected to: “L” with the right ring finger. I recommend that the authors carefully proofread the manuscript to identify and correct such errors.

      We sincerely apologize for the errors present in the manuscript and have now carefully proofread it (in the Procedure section).

      Comment 4: On page 13, line 254, the elimination of the Stroop effect should not be interpreted as an improvement in processing.

      We greatly appreciate your suggestion. We agree that the elimination of the Stroop effect should not be confused with improvements in processing. We have corrected this in the revised version (the second paragraph of the Discussion section).

      Reviewer #3 (Recommendations for the authors):

      Comment 1: In the introduction section, the N450 was introduced as “a frontal-central negative deflection”, but in the methods part the N450 was computed using central-parietal electrodes. This inconsistency is confusing and needs to be clarified.

      We apologize for this confusion. We have provided a detailed explanation regarding the differences in electrodes and the rationale behind choosing central-parietal electrodes in our response to Reviewer 2’s second comment. To clarify, we have updated the introduction to consistently label them as central-parietal deflections (in the third paragraph of the Introduction section).

      Comment 2: I speculate the “beta” was mistakenly written as “theta” in line 212.

      We sincerely apologize for this mistake. We have corrected this error (in the RSA results section).

      Comment 3: The speculation that “changes in beta bands may be influenced by theta bands, thereby indirectly influencing the behavioral Stroop effect” needs to be rationalized.

      We appreciate your suggestion. What we intended to convey is that we found an interaction effect in the beta bands; however, the RSA results did not show a correlation with the behavioral interaction effect. We speculate that beta activity might be influenced by the theta bands. On the one hand, we realize that the idea of beta bands indirectly influencing the behavioral Stroop effect was inappropriate, and we have removed this point in the revised version. On the other hand, we have provided rational evidence for the idea that beta bands may be influenced by theta bands. This is based on the biological properties of theta oscillations, which support communication between different cortical neural signals, and their functional role in integrating and transmitting task-relevant information to response execution (in the third-to-last paragraph of the Discussion section).

      Comment 4: Typo in line 479: [10,10].

      We sincerely apologize for this mistake. We have corrected this error: [-10,10] (in the Multivariate pattern analysis section).

      Reference

      Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences, 18(8), 414–421. https://doi.org/10.1016/j.tics.2014.04.012

      Chen, M. J. (1993). A Comparison of Chinese and English Language Processing. In Advances in Psychology (Vol. 103, pp. 97–117). North-Holland. https://doi.org/10.1016/S0166-4115(08)61659-3

      Chen, X. F., Jiang, J., Zhao, X., & Chen, A. (2010). Effects of practice on semantic conflict and response conflict in the Stroop task. Psychol. Sci., 33, 869–871.

      Chen, Z., Lei, X., Ding, C., Li, H., & Chen, A. (2013). The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage, 66, 577–584. https://doi.org/10.1016/j.neuroimage.2012.10.028

      Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. The MIT Press. https://doi.org/10.7551/mitpress/9609.001.0001

      Duprez, J., Gulbinaite, R., & Cohen, M. X. (2020). Midfrontal theta phase coordinates behaviorally relevant brain computations during cognitive control. NeuroImage, 207, 116340. https://doi.org/10.1016/j.neuroimage.2019.116340

      Duque, J., Greenhouse, I., Labruna, L., & Ivry, R. B. (2017). Physiological Markers of Motor Inhibition during Human Behavior. Trends in Neurosciences, 40(4), 219–236. https://doi.org/10.1016/j.tins.2017.02.006

      Engel, A. K., & Fries, P. (2010). Beta-band oscillations—Signalling the status quo? Current Opinion in Neurobiology, 20(2), 156–165. https://doi.org/10.1016/j.conb.2010.02.015

      Heidlmayr, K., Kihlstedt, M., & Isel, F. (2020). A review on the electroencephalography markers of Stroop executive control processes. Brain and Cognition, 146, 105637. https://doi.org/10.1016/j.bandc.2020.105637

      Little, S., Bonaiuto, J., Barnes, G., & Bestmann, S. (2019). Human motor cortical beta bursts relate to movement planning and response errors. PLOS Biology, 17(10), e3000479. https://doi.org/10.1371/journal.pbio.3000479

      Morales, S., & Bowers, M. E. (2022). Time-frequency analysis methods and their application in developmental EEG data. Developmental Cognitive Neuroscience, 54, 101067. https://doi.org/10.1016/j.dcn.2022.101067

      Senoussi, M., Verbeke, P., Desender, K., De Loof, E., Talsma, D., & Verguts, T. (2022). Theta oscillations shift towards optimal frequency for cognitive control. Nature Human Behaviour, 6(7), Article 7. https://doi.org/10.1038/s41562-022-01335-5

      van Veen, V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage, 27(3), 497–504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Zhao, X., Chen, A., & West, R. (2010). The influence of working memory load on the Simon effect. Psychonomic Bulletin & Review, 17(5), 687–692. https://doi.org/10.3758/PBR.17.5.687

    1. eLife Assessment

      This study uses carefully designed experiments to generate a useful behavioural and neuroimaging dataset on visual cognition. The results provide solid evidence for the involvement of higher-order visual cortex in processing visual oddballs and asymmetry. However, the evidence provided for the very strong claims of homogeneity as a novel concept in vision science, separable from existing concepts such as target saliency, is incomplete. The authors and the reviewers do not agree on several points, which are explained in the reviews and author response.

    2. Reviewer #1 (Public review):

      Summary:

      The authors define a new metric for visual displays, derived from psychophysical response times, called visual homogeneity (VH). They attempt to show that VH is explanatory of response times across multiple visual tasks. They use fMRI to find visual cortex regions with VH-correlated activity. On this basis, they declare a new visual region in human brain, area VH, whose purpose is to represent VH for the purpose of visual search and symmetry tasks.

      Link to original review: https://elifesciences.org/reviewed-preprints/93033v2/reviews#peer-review-0

      Comments on latest version:

      Authors rebuttal: We agree that visual homogeneity is similar to existing concepts such as target saliency, memorability etc. We have proposed it as a separate concept because visual homogeneity has an independent empirical measure (the reciprocal of target-absent search time in oddball search, or the reciprocal of same response time in a same-different task, etc) that may or may not be the same as other empirical measures such as saliency and memorability. Investigating these possibilities is beyond the scope of our study but would be interesting for future work. We have now clarified this in the revised manuscript (Discussion, p. 42).

      Reviewer response to rebuttal: Neither the original ms nor the comments on that ms pretended that "visual homogeneity" was entirely separate from target saliency etc. So this is a response to a criticism that was never made. What the authors do claim, and what the comments question, is that they have successfully subsumed long-recognized psychophysical concepts like target saliency etc. under a new, uber-concept, "visual homogeneity" that explains psychophysical experimental results in a more unified and satisfying way. This subsumption of several well-established psychophysical concepts under a new, unified category is what reviewers objected to.

      Authors rebuttal: However, we'd like to emphasize that the question of whether visual homogeneity is novel or related to existing concepts misses entirely the key contribution of our study.

      Reviewer response to rebuttal: Sorry, but the claim of a new uber-concept in psychophysics, "visual homogeneity", is a major claim of the paper. The fact that it is not the only claim made does not absolve the authors from having to prove it satisfactorily.

      "Authors rebuttal: "In addition, the large regions of VH correlations identified in Experiments 1 and 2 vs. Experiments 3 and 4 are barely overlapping. This undermines the claim that VH is a universal quantity, represented in a newly discovered area of visual cortex, that underlies a wide variety of visual tasks and functions."<br /> • We respectfully disagree with your assertion. First of all, there is partial overlap between the VH regions, for which there are several other obvious explanations that must be considered first before dismissing VH outright as a flawed construct. We acknowledge these alternatives in the Results (p. 27), and the relevant text is reproduced below.

      "We note that it is not straightforward to interpret the overlap between the VH regions identified in Experiments 2 & 4. The lack of overlap could be due to stimulus differences (natural images in Experiment 2 vs silhouettes in Experiment 4), visual field differences (items in the periphery in Experiment 2 vs items at the fovea in Experiment 4) and even due to different participants in the two experiments. There is evidence supporting all these possibilities: stimulus differences (Yue et al., 2014), visual field differences (Kravitz et al., 2013) as well as individual differences can all change the locus of neural activations in object-selective cortex (Weiner and Grill-Spector, 2012a; Glezer and Riesenhuber, 2013). We speculate that testing the same participants on search and symmetry tasks using similar stimuli and display properties would reveal even larger overlap in the VH regions that drive behavior."

      Reviewer response to rebuttal: The authors are saying that their results merely look unconvincing (weak overlap between VH regions defined in different experiments) because there were confounding differences between their experiments, in subject population, stimuli, etc. That is possible, but in that case it is up to the authors to show that their definition of a new "area VH" is convincing when the confounding differences are resolved, e.g. by using the same stimuli in the different experiments they attempt to agglomerate here. That would require new experiments, and none are offered in this revision.

      Authors rebuttal: • Thank you for carefully thinking through our logic. We agree that a distance-to-centre calculation is entirely unnecessary as an explanation for target-present visual search. The similarity between target and distractor, so there is nothing new to explain here. However, this is a narrow and selective interpretation of our findings because you are focusing only on our results on target-present searches, which are only half of all our data. The other half is the target-absent responses which previously have had no clear explanation. You are also missing the fact that we are explaining same-different and symmetry tasks as well using the same visual homogeneity computation. We urge you to think more deeply about the problem of how to decide whether an oddball is present or not in the first place. How do we actually solve this task?

      Reviewer response to rebuttal: It is the role of the authors to think deeply about their paper and on that basis present a clear and compelling case that readers can understand quickly and agree with. That is not done here.

      Authors rebuttal: There must be some underlying representation and decision process. Our study shows that a distance-to-centre computation can actually serve as a decision variable to solve disparate property-based visual tasks. These tasks pose a major challenge to standard models of decision-making because the underlying representation and decision variable have been unclear. Our study resolves this challenge by proposing a novel computation that can be used by the brain to solve all these disparate tasks, and bring these tasks into the ambit of standard theories of decision-making.

      Reviewer response to rebuttal: There is only a "challenge" if you accept the authors' a priori assumption that all of these tasks must have a common explanation and rely on a single neural mechanism. I do not accept that assumption, and I don't think the authors provide evidence to support the assumption. There is nothing "unclear" about how search, oddball, etc. have been thoroughly explained, separately, in the psychophysical literature that spans more than a century.

      Authors rebuttal: • You are indeed correct in noting that both Experiment 1 & 2 involve oddball search, and so at the superficial level, it looks circular that the oddball search data of Experiment 1 is being used to explain the oddball search data of Experiment 2.<br /> However a deeper scrutiny reveals more fundamental differences: Experiment 1 consisted of only oddball search with the target appearing on the left or right, whereas Experiment 2 consisted of oddball search with the target either present or completely absent. In fact, we were merely using the search dissimilarities from Experiment 1 to reconstruct the underlying object representation, because it is well-known that neural dissimilarities are predicted well by search dissimilarities (Sripati & Olson, 2009; Zhivago et al, 2014).

      Reviewer response to rebuttal: Here again the authors cite differences between their multiple experiments as a virtue that supports their conclusions. Instead, the experiments should have been designed for maximum similarity if the authors intended to explain them with the same theory.

      Authors rebuttal: To thoroughly refute any lingering concern about circularity, we reasoned that the model predictions for Experiment 2 could have been obtained by a distance-to-center computation on any brain like object representation. To this end, we used object representations from deep neural networks pretrained on object categorization, whose representations are known to match well with the brain, and asked if a distance-to-centre computation on these representations could predict the search data in Experiment 2. This was indeed the case, and these results are now included an additional section in Supplementary Material (Section S1).

      Reviewer response to rebuttal: The authors' claims are about human performance and how it is based on the human brain. Their claims are not well supported by the human experiments that they performed. It serves no purpose to redo the same experiments in silico, which cannot provide stronger evidence that compensates for what was lacking in the human data.

      Authors rebuttal: "Confirming the generality of visual homogeneity<br /> We performed several additional analyses to confirm the generality of our results, and to reject alternate explanations.

      First, it could be argued that our results are circular because they involve taking oddball search times from Experiment 1 and using them to explain search response times in Experiment 2. This is a superficial concern since we are using the search dissimilarities from Experiment 1 only as a proxy for the underlying neural representation, based on previous reports that neural dissimilarities closely match oddball search dissimilarities (Sripati and Olson, 2010; Zhivago and Arun, 2014). Nonetheless, to thoroughly refute this possibility, we reasoned that we would get similar predictions of the target present/absent responses in Experiment using any other brain-like object representation. To confirm this, we replaced the object representations derived from Experiment 1 with object representations derived from deep neural networks pretrained for object categorization, and asked if distance-to-center computations could predict the target present/absent responses in Experiment 2. This was indeed the case (Section S1).

      Second, we wondered whether the nonlinear optimization process of finding the best-fitting center could be yielding disparate optimal centres each time. To investigate this, we repeated the optimization procedure with many randomly initialized starting points, and obtained the same best-fitting center each time (see Methods).

      Third, to confirm that the above model fits are not due to overfitting, we performed a leave-one-out cross validation analysis. We left out all target-present and target-absent searches involving a particular image, and then predicted these searches by calculating visual homogeneity estimated from all other images. This too yielded similar positive and negative correlations (r = 0.63, p < 0.0001 for target-present, r = -0.63, p < 0.001 for target-absent).

      Fourth, if heterogeneous displays indeed elicit similar neural responses due to mixing, then their average distance to other objects must be related to their visual homogeneity. We confirmed that this was indeed the case, suggesting that the average distance of an object from all other objects in visual search can predict visual homogeneity (Section S1).

      Fifth, the above results are based on taking the neural response to oddball arrays to be the average of the target and distractor responses. To confirm that averaging was indeed the optimal choice, we repeated the above analysis by assuming a range of relative weights between the target and distractor. The best correlation was obtained for almost equal weights in the lateral occipital (LO) region, consistent with averaging and its role in the underlying perceptual representation (Section S1).

      Finally, we performed several additional experiments on a larger set of natural objects as well as on silhouette shapes. In all cases, present/absent responses were explained using visual homogeneity (Section S2)."

      Reviewer response to rebuttal: The authors can experiment on side questions for as long as they please, but none of the results described above answer the concern about how center-fitting undercuts the evidentiary value of their main results.

      Authors rebuttal: • While it is true that the optimal center needs to be found by fitting to the data, there no particular mystery to the algorithm: we are simply performing a standard gradient-descent to maximize the fit to the data. We have described the algorithm clearly and are making our codes public. We find the algorithm to yield stable optimal centers despite many randomly initialized starting points. We find the optimal center to be able to predict responses to entirely novel images that were excluded during model training. We are making no assumption about the location of centre with respect to individual points. Therefore, we see no cause for concern regarding the center-finding algorithm.

      Reviewer response to rebuttal: The point of the original comment was that center-fitting should not be done in the first place because it introduces unknowable effects.

      •Authors rebuttal: Most visual tasks, such as finding an animal, are thought to involve building a decision boundary on some underlying neural representation. Even visual search has been portrayed as a signal-detection problem where a particular target is to be discriminated from a distractor. However none of these formulations work in the case of property-based visual tasks, where there is no unique feature to look for.<br /> We are proposing that, when we view a search array, the neural response to the search array can be deduced from the neural responses to the individual elements using well-known rules, and that decisions about an oddball target being present or absent can be made by computing the distance of this neural response from some canonical mean firing rate of a population of neurons. This distance to center computation is what we denote as visual homogeneity. We have revised our manuscript throughout to make this clearer and we hope that this helps you understand the logic better.<br /> • You are absolutely correct that the stimulus complexity should matter, but there are no good empirically derived measures for stimulus complexity, other than subjective ratings which are complex on their own and could be based on any number of other cognitive and semantic factors. But considering what factors are correlated with target-absent response times is entirely different from asking what decision variable or template is being used by participants to solve the task.

      Reviewer response to rebuttal: If stimulus complexity is what matters, as the authors agree here, then it is incumbent on them to measure stimulus complexity. The difficulty of measuring stimulus complexity does not justify avoiding the problem with an analysis that ignores complexity.

      Authors rebuttal: • We have provided empirical proof for our claims, by showing that target-present response times in a visual search task are correlated with "different" responses in the same-different task, and that target-absent response times in the visual search task are correlated with "same" responses in the same-different task (Section S4).

      Reviewer response to rebuttal: Sorry, but there is still no reason to think that same-different judgments are based on a mythical boundary halfway between the two. If there is a boundary, it will be close to the same end of the continuum, where subjects might conceivably miss some tiny difference between two stimuli. The vast majority of "different" stimuli will be entirely different from the same stimulus, producing no confusability, and certainly not a decision boundary halfway between two extremes.

      Authors rebuttal: • Again, the opposite correlations between target present/absent search times with VH are the crucial empirical validation of our claims that a distance-to-center calculation explain how we perform these property-based tasks. The VH predictions do not fully explain the data. We have explicitly acknowledged this shortcoming, so we are hardly dismissing it as a problem.

      Reviewer response to rebuttal: The authors' acknowledgement of flaws in the ms does not argue in favor of publication, but rather just the opposite.

      Authors rebuttal: • Finding an oddball, deciding if two items are same or different and symmetry tasks are disparate visual tasks that do not fit neatly into standard models of decision-making. The key conceptual advance of our study is that we propose a plausible neural representation and decision variable that allows all three property-based visual tasks to be reconciled with standard models of decision-making.

      Reviewer response to rebuttal: The original comment stands as written. Same/different will have a boundary very close to the "same" end of the continuum. The boundary is only halfway between two choices if the stimulus design forces the boundary to be there, as in the motion and cat/dog experiments.

      Authors rebuttal: "There is no inherent middle point boundary between target present and target absent. Instead, in both types of trial, maximum information is present when target and distractors are most dissimilar, and minimum information is present when target and distractors are most similar. The point of greatest similarity occurs at then limit of any metric for similarity. Correspondingly, there is no middle point dip in information that would produce greater difficulty and higher response times. Instead, task difficulty and response times increase monotonically with similarity between targets and distractors, for both target present and target absent decisions. Thus, in Figs. 2F and 2G, response times appear to be highest for animals, which share the largest numbers of closely similar distractors."<br /> • Your alternative explanation rests on vague factors like "maximum information" which cannot be quantified. By contrast we are proposing a concrete, falsifiable model for three property-based tasks - same/different, oddball present/absent and object symmetry. Any argument based solely on item similarity to explain visual search or symmetry responses cannot explain systematic variations observed for target-absent arrays and for symmetric objects, for the reasons explained earlier.

      Reviewer response to rebuttal: There is nothing vague about this comment. The authors use an analysis that assumes a decision boundary at the centerpoint of their arbitrarily defined stimulus space. This assumption is not supported, and it is unlikely, considering that subjects are likely to notice all but the smallest variations between same and different stimuli, putting the boundary nearly at the same end of the continuum, not the very middle.

      Authors rebuttal: "(1) The area VH boundaries from different experiments are nearly completely non-overlapping.

      In line with their theory that VH is a single continuum with a decision boundary somewhere in the middle, the authors use fMRI searchlight to find an area whose responses positively correlate with homogeneity, as calculated across all of their target present and target absent arrays. They report VH-correlated activity in regions anterior to LO. However, the VH defined by symmetry Experiments 3 and 4 (VHsymmetry) is substantially anterior to LO, while the VH defined by target detection Experiments 1 and 2 (VHdetection) is almost immediately adjacent to LO. Fig. S13 shows that VHsymmetry and VHdetection are nearly non-overlapping. This is a fundamental problem with the claim of discovering a new area that represents a new quantity that explains response times across multiple visual tasks. In addition, it is hard to understand why VHsymmetry does not show up in a straightforward subtraction between symmetric and asymmetric objects, which should show a clear difference in homogeneity."

      • We respectfully disagree. The partial overlap between the VH regions identified in Experiments 1 & 2 can hardly be taken as evidence against the quantity VH itself, because there are several other obvious alternate explanations for this partial overlap, as summarized earlier as well. The VH region does show up in a straightforward subtraction between symmetric and asymmetric objects (Section S7), so we are not sure what the Reviewer is referring to here.

      Reviewer response to rebuttal: In disagreeing with the comment quoted above, the authors are maintaining that a new functional area of cerebral cortex can be declared even if that area changes location on the cortical map from one experiment to another. That position is patently absurd.

      Authors rebuttal: "(3) Definition of the boundaries and purpose of a new visual area in the brain requires circumspection, abundant and convergent evidence, and careful controls.

      Even if the VH metric, as defined and calculated by the authors here, is a meaningful quantity, it is a bold claim that a large cortical area just anterior to LO is devoted to calculating this metric as its major task. Vision involves much more than target detection and symmetry detection. Cortex anterior to LO is bound to perform a much wider range of visual functionalities. If the reported correlations can be clarified and supported, it would be more circumspect to treat them as one byproduct of unknown visual processing in cortex anterior to LO, rather than treating them as the defining purpose for a large area of visual cortex."

      • We totally agree with you that reporting a new brain region would require careful interpretation and abundant and converging evidence. However, this requires many studies worth of work, and historically category-selective regions like the FFA have achieved consensus only after they were replicated and confirmed across many studies. We believe our proposal for the computation of a quantity like visual homogeneity is conceptually novel, and our study represents a first step that provides some converging evidence (through replicable results across different experiments) for such a region. We have reworked our manuscript to make this point clearer (Discussion, p 32).

      Reviewer response to rebuttal: Indeed, declaring a new brain area depends on much more work than is done here. Thus, the appropriate course here is to wait before claiming to have identified a new cortical area.

    3. Reviewer #2 (Public review):

      Summary:

      This study proposes visual homogeneity as a novel visual property that enables observers perform to several seemingly disparate visual tasks, such as finding an odd item, deciding if two items are same, or judging if an object is symmetric. In Exp 1, the reaction times on several objects were measured in human subjects. In Exp 2, visual homogeneity of each object was calculated based on the reaction time data. The visual homogeneity scores predicted reaction times. This value was also correlated with the BOLD signals in a specific region anterior to LO. Similar methods were used to analyze reaction time and fMRI data in a symmetry detection task. It is concluded that visual homogeneity is an important feature that enables observers to solve these two tasks.

      Strengths:

      (1) The writing is very clear. The presentation of the study is informative.

      (2) This study includes several behavioral and fMRI experiments. I appreciate the scientific rigor of the authors.

      Weaknesses:

      Before addressing the manuscript itself, I would like to comment the review process first. Having read the lasted revised manuscript, I shared many of the concerns raised by the two reviewers in the last two rounds of review. It appears that the authors have disagreed with the majority of comments made by the two reviewers. If so, I strongly recommend that the authors proceed to make this revision as a Version of Record and conclude this review process. According to eLife's policy that the authors have the right to make a Version of Record at any time during the review process, and I fully respect that right. However, I also ask that the authors respect the reviewer's right to retain the comments regarding this paper.

      Beside that, I still have several further questions about this study.

      (1) My main concern with this paper is the way visual homogeneity is computed. On page 10, lines 188-192, it says: "we then asked if there is any point in this multidimensional representation such that distances from this point to the target-present and target-absent response vectors can accurately predict the target-present and target-absent response times with a positive and negative correlation respectively (see Methods)". This is also true for the symmetry detection task. If I understand correctly, the reference point in this perceptual space was found by deliberating satisfying the negative and positive correlations in response times. And then on page 10, lines 200-205, it shows that the positive and negative correlations actually exist. This logic is confusing. The positive and negative correlations emerge only because this method is optimized to do so. It seems more reasonable to identify the reference point of this perceptual space independently, without using the reaction time data. Otherwise, the inference process sounds circular. A simple way is to just use the mean point of all objects in Exp 1, without any optimization towards reaction time data.<br /> I raised this question in my initial review. However, the authors did not address whether the positive and negative correlations still hold if the mean point is defined as the reference point without any optimization. The authors also argue that it is similar to a case of fitting a straight line. It is fine that the authors insist on the straight line (e.g., correlation). However, I would not call "straight line correlations" a "quantitative model" as a high-profile journals like eLife. Please remove all related arguments of a novel quantitative model.

      (2) Visual homogeneity (at least given the current form) is an unnecessary term. It is similar to distractor heterogeneity/distractor variability/distractor saliency in literature. However, the authors attempt to claim it as a novel concept. Both R1 and me raised this question in the very first review. However, the authors refused to revise the manuscript. In the last review, I mentioned this and provided some example sentences claiming novelty. The authors only revised the last sentence of the abstract, and even did not bother to revise the last sentence of significance: "we show that these tasks can be solved using a simple property WE DEFINE as visual homogeneity". Also, lines 851 still shows "we have defined a NOVEL image property, visual homogeneity...". I am confused about whether the authors agree or disagree that "visual homogeneity is an unnecessary term". If the authors agree, they should completely remove the related phrase throughout the paper. If not, they should keep all these and state the reasons. I don't think this is a correct approach to revising a manuscript.

      (3) If the authors agree that visual homogeneity is not new, I suggest a complete rewrite of the title, abstract, significance, and introduction. Let me ask a simple question, can we remove "visual homogeneity" and use some more well-established term like "image feature similarity"? If yes, visual homogeneity is unnecessary.

      (4) If I understand it correctly, one of the key findings of this paper is "the response times for target-present searches were positively correlated with visual homogeneity. By contrast, the response times for target-absent searches were negatively correlated with visual homogeneity" (lines 204-207). I think the authors have already acknowledged that this positive correlation is not surprising at all because it reflects the classic target-distractor similarity effect. If this is the case, please completely remove the positive correlation as a novel prediction and finding.

      (5) In my last review, I mentioned the seminal paper by Duncan and Humphreys (1989) has clearly stated that "difficulty increases with increased similarity of targets to nontargets and decreased similarity between nontargets" (the sentence in their abstract). Here, "similarity between nontargets" is the same as the visual homogeneity defined here. Similar effects have been shown in Duncan (1989) and Nagy, Neriani, and Young (2005). See also the inconsistent results in Nagy& Thomas, 2003, Vicent, Baddeley, Troscianko&Gilchrist, 2009. More recently, Wei Ji Ma has systematically investigated the effects of heterogeneous distractors in visual search. I think the introduction part of Wei Ji Ma's paper (2020) provides a nice summary of this line of research.

      Thanks to the authors' revision, I now better understand the negative correlation. The between-distrator similarity mentioned above describes the heterogeneity of distractors WITHIN an image. However, if I understand it correctly, this study aims to address the negative correlation of reaction time and target-absent stimuli ACROSS images. In other words, why do humans show a shorter reaction time to an image of four pigeons than to an image of four dogs (as shown in Figure 2C), simply because the later image is closer to the reference point of the image space. In this sense, this negative correlation is indeed not the same as distractor heterogeneity. However, this is known as the saliency effect or oddball effects. For example, it seems quite natural to me that humans respond faster to a fish image if the image set contains many images of four-leg dogs that look very different from fish. If this is indeed a saliency effect, why should we define a new term "visual homogeneity"?

      (6) The section "key predictions" is quite straightforward. I understand the logic of positive and negative correlations. However, what is the physical meaning of "decision boundary" (Fig. 1G) here? How does the "decision boundary" map on the image space?

      (7) In my opinion, one of the advantages of this study is the fMRI dataset, which is valuable because previous studies did not collect fMRI data. The key contribution may be the novel brain region associated with display heterogeneity. If this is the case, I would suggest using a more parametric way to measure this region. For example, one can use Gabor stimuli and systematically manipulate the variations of multiple Gabor stimuli, the same logic also applies to motion direction. If this study uses static Gabor, random dot motion, object images that span from low-level to high-level visual stimuli, and consistently shows that the stimulus heterogeneity is encoded in one brain region, I would say this finding is valuable. But this sounds another experiment. In other words, it is insufficient to claim a new brain region given the current form of the manuscript.

      References:

      * Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433-458. doi: 10.1037/0033-295x.96.3.433<br /> * Duncan, J. (1989). Boundary conditions on parallel processing in human vision. Perception, 18(4), 457-469. doi: 10.1068/p180457<br /> * Nagy, A. L., Neriani, K. E., & Young, T. L. (2005). Effects of target and distractor heterogeneity on search for a color target. Vision Research, 45(14), 1885-1899. doi: 10.1016/j.visres.2005.01.007<br /> * Nagy, A. L., & Thomas, G. (2003). Distractor heterogeneity, attention, and color in visual search. Vision Research, 43(14), 1541-1552. doi: 10.1016/s0042-6989(03)00234-7<br /> * Vincent, B., Baddeley, R., Troscianko, T., & Gilchrist, I. (2009). Optimal feature integration in visual search. Journal of Vision, 9(5), 15-15. doi: 10.1167/9.5.15<br /> * Singh, A., Mihali, A., Chou, W. C., & Ma, W. J. (2023). A Computational Approach to Search in Visual Working Memory.<br /> * Mihali, A., & Ma, W. J. (2020). The psychophysics of visual search with heterogeneous distractors. BioRxiv, 2020-08.<br /> * Calder-Travis, J., & Ma, W. J. (2020). Explaining the effects of distractor statistics in visual search. Journal of Vision, 20(13), 11-11.

    4. Reviewer #3 (Public review):

      Summary of the review process from the Reviewing Editor:

      The authors and the reviewers did not agree on several important points made in this paper. The reviewers were critical of the operationalisation of the concept of visual homogeneity (VH), and questioned its validity. For instance, they found it unsatisfying that VH was not calculated on the basis of images themselves, but on the basis of reaction times instead. The authors responded by providing further explanation and argumentation for the importance of this novel concept, but the reviewers were not persuaded. The reviewers also pointed out some data features that did not fit the theory (e.g., overlapping VH between present and absent stimuli), which the authors acknowledge as a point that needs further refining. Finally, the reviewers pointed out that the new so-called visual homogeneity brain region does not overlap very much in the two studies, to which the authors have responded that it is remarkable that there is even partial overlap, given the many confounding differences between the two studies. Altogether, the authors have greatly elaborated their case for VH as an important concept, but the reviewers were not persuaded, and we conclude that the current evidence does not yet meet the high bar for declaring that a novel image property, visual homogeneity, is computed in a localised brain region.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      We are grateful to the editors and reviewers for their careful reading and constructive comments. We have now done our best to respond to them fully through additional analyses and text revisions. In the sections below, the original reviewer comments are in black, and our responses are in red.

      To summarize, the major changes in this round of review are as follows:

      (1) We have included a new introductory figure (Figure 1) to explain the distinction between feature-based tasks and property-based tasks.

      (2) We have included a section on “key predictions” and a section on “overview of this study” in the Introduction to clearly delineate our key predictions and provide a overview of our study.

      (3) We have included additional analyses to address the reviewers’ concerns about circularity in Experiments 1 & 2. We show that distance-to-center or visual homogeneity computations performed on object representations obtained from deep networks (instead of the perceptual dissimilarities from Experiment 1) also yields comparable predictions of target-present and target-absent responses in Experiment 2. 

      (4) We have extensively reworked the manuscript wherever possible to address the specific concerns raised by the reviewers.

      We hope that the revised manuscript adequately addresses the concerns raised in this round of review, and we look forward to a positive assessment.

      eLife Assessment

      This study uses carefully designed experiments to generate a useful behavioural and neuroimaging dataset on visual cognition. The results provide solid evidence for the involvement of higher-order visual cortex in processing visual oddballs and asymmetry. However, the evidence provided for the very strong claims of homogeneity as a novel concept in vision science, separable from existing concepts such as target saliency, is inadequate.

      Thank you for your positive assessment. We agree that visual homogeneity is similar to existing concepts such as target saliency, memorability etc. We have proposed it as a separate concept because visual homogeneity has an independent empirical measure (the reciprocal of target-absent search time in oddball search, or the reciprocal of same response time in a same-different task, etc) that may or may not be the same as other empirical measures such as saliency and memorability. Investigating these possibilities is beyond the scope of our study but would be interesting for future work. We have now clarified this in the revised manuscript (Discussion, p. 42).

      However, we’d like to emphasize that the question of whether visual homogeneity is novel or related to existing concepts misses entirely the key contribution of our study.

      Our key contribution is a quantitative, falsifiable model for how the brain could be solving property-based tasks like same-different, oddball or symmetry. Most theories of decision making consider feature-based tasks where there is a well-defined feature space and decision variable. Property-based tasks pose a significant challenge to standard theories since it is not clear how these tasks could be solved. In fact, oddball search, same-different and symmetry tasks have been considered so different that they are rarely even mentioned in the same study. Our study represents a unifying framework showing that all three tasks can be understood as solving the same underlying fundamental problem, and presents evidence in favor of this solution.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors define a new metric for visual displays, derived from psychophysical response times, called visual homogeneity (VH). They attempt to show that VH is explanatory of response times across multiple visual tasks. They use fMRI to find visual cortex regions with VH-correlated activity. On this basis, they declare a new visual region in human brain, area VH, whose purpose is to represent VH for the purpose of visual search and symmetry tasks.

      Thank you for your accurate and positive assessment.

      Strengths:

      The authors present carefully designed experiments, combining multiple types of visual judgments and multiple types of visual stimuli with concurrent fMRI measurements. This is a rich dataset with many possibilities for analysis and interpretation.

      Thank you for your accurate and positive assessment.

      Weaknesses:

      The datasets presented here should provide a rich basis for analysis. However, in this version of the manuscript, I believe that there are major problems with the logic underlying the authors' new theory of visual homogeneity (VH), with the specific methods they used to calculate VH, and with their interpretation of psychophysical results using these methods. These problems with the coherency of VH as a theoretical construct and metric value make it hard to interpret the fMRI results based on searchlight analysis of neural activity correlated with VH.

      We respectfully disagree with your concerns, and have done our best to respond to them fully below.

      In addition, the large regions of VH correlations identified in Experiments 1 and 2 vs. Experiments 3 and 4 are barely overlapping. This undermines the claim that VH is a universal quantity, represented in a newly discovered area of visual cortex, that underlies a wide variety of visual tasks and functions.

      We respectfully disagree with your assertion. First of all, there is partial overlap between the VH regions, for which there are several other obvious explanations that must be considered first before dismissing VH outright as a flawed construct. We acknowledge these alternatives in the Results (p. 27), and the relevant text is reproduced below.

      “We note that it is not straightforward to interpret the overlap between the VH regions identified in Experiments 2 & 4. The lack of overlap could be due to stimulus differences (natural images in Experiment 2 vs silhouettes in Experiment 4), visual field differences (items in the periphery in Experiment 2 vs items at the fovea in Experiment 4) and even due to different participants in the two experiments. There is evidence supporting all these possibilities: stimulus differences (Yue et al., 2014), visual field differences (Kravitz et al., 2013) as well as individual differences can all change the locus of neural activations in object-selective cortex (Weiner and Grill-Spector, 2012a; Glezer and Riesenhuber, 2013). We speculate that testing the same participants on search and symmetry tasks using similar stimuli and display properties would reveal even larger overlap in the VH regions that drive behavior.”

      Maybe I have missed something, or there is some flaw in my logic. But, absent that, I think the authors should radically reconsider their theory, analyses, and interpretations, in light of detailed comments below, in order to make the best use of their extensive and valuable datasets combining behavior and fMRI. I think doing so could lead to a much more coherent and convincing paper, albeit possibly supporting less novel conclusions.

      We respectfully disagree with your assessment, and we hope that our detailed responses below will convince you of the merit of our claims.

      THEORY AND ANALYSIS OF VH

      (1) VH is an unnecessary, complex proxy for response time and target-distractor similarity.<br /> VH is defined as a novel visual quality, calculable for both arrays of objects (as studied in Experiments 1-3) and individual objects (as studied in Experiment 4). It is derived from a center-to-distance calculation in a perceptual space. That space in turn is derived from multi-dimensional scaling of response times for target-distractor pairs in an oddball detection task (Experiments 1 and 2) or in a same different task (Experiments 3 and 4).  Proximity of objects in the space is inversely proportional to response times for arrays in which they were paired. These response times are higher for more similar objects. Hence, proximity is proportional to similarity. This is visible in Fig. 2B as the close clustering of complex, confusable animal shapes.

      VH, i.e. distance-to-center, for target-present arrays is calculated as shown in Fig. 1C, based on a point on the line connecting target and distractors. The authors justify this idea with previous findings that responses to multiple stimuli are an average of responses to the constituent individual stimuli. The distance of the connecting line to the center is inversely proportional to the distance between the two stimuli in the pair, as shown in Fig. 2D. As a result, VH is inversely proportional to distance between the stimuli and thus to stimulus similarity and response times. But this just makes VH a highly derived, unnecessarily complex proxy for target-distractor similarity and response time. The original response times on which the perceptual space is based are far more simple and direct measures of similarity for predicting response times.

      Thank you for carefully thinking through our logic. We agree that a distance-to-centre calculation is entirely unnecessary as an explanation for target-present visual search. The difficulty of target-present search is already known to be directly proportional to the similarity between target and distractor, so there is nothing new to explain here.

      However, this is a narrow and selective interpretation of our findings because you are focusing only on our results on target-present searches, which are only half of all our data. The other half is the target-absent responses which previously have had no clear explanation. You are also missing the fact that we are explaining same-different and symmetry tasks as well using the same visual homogeneity computation.

      We urge you to think more deeply about the problem of how to decide whether an oddball is present or not in the first place. How do we actually solve this task? There must be some underlying representation and decision process. Our study shows that a distance-to-centre computation can actually serve as a decision variable to solve disparate property-based visual tasks. These tasks pose a major challenge to standard models of decision making, because the underlying representation and decision variable have been unclear. Our study resolves this challenge by proposing a novel computation that can be used by the brain to solve all these disparate tasks, and bring these tasks into the ambit of standard theories of decision making.  

      Our results also explain several interesting puzzles in the literature. If oddball search was driven only by target-distractor similarity, the time taken to respond when a target is absent should not vary at all, and should actually take longer than all target-present searches. But in fact, systematic variations in target-absent times have been observed always in the literature, but have never been explained using any theoretical models. Our results explain why target-absent times vary systematically – it is due to visual homogeneity.

      Similarly, in same-different tasks, participants are known to take longer to make a “different” response when the two items differ only slightly. By this logic, they should take the longest to make a “same” response, but in fact, paradoxically, participants are actually faster to make “same” responses. This fast-same effect has been noted several times, but never explained using any models. Our results provide an explanation of why “same” responses to an image vary systematically – it is due to visual homogeneity. 

      Finally, in symmetry tasks, symmetric objects evoke fast responses, and this has always been taken as evidence for special symmetry computations in the brain. But we show that the same distance-to-center computation can explain both responses to symmetric and asymmetric objects. Thus there is no need for a special symmetry computation in the brain.

      (2) The use of VH derived from Experiment 1 to predict response times in Experiment 2 is circular and does not validate the VH theory.<br /> The use of VH, a response time proxy, to predict response times in other, similar tasks, using the same stimuli, is circular. In effect, response times are being used to predict response times across two similar experiments using the same stimuli. Experiment 1 and the target present condition of Experiment 2 involve the same essential task of oddball detection. The results of Experiment 1 are converted into VH values as described above, and these are used to predict response times in experiment 2 (Fig. 2F). Since VH is a derived proxy for response values in Experiment 1, this prediction is circular, and the observed correlation shows only consistency between two oddball detection tasks in two experiments using the same stimuli.

      You are indeed correct in noting that both Experiment 1 & 2 involve oddball search, and so at the superficial level, it looks circular that the oddball search data of Experiment 1 is being used to explain the oddball search data of Experiment 2.

      However a deeper scrutiny reveals more fundamental differences: Experiment 1 consisted of only oddball search with the target appearing on the left or right, whereas Experiment 2 consisted of oddball search with the target either present or completely absent. In fact, we were merely using the search dissimilarities from Experiment 1 to reconstruct the underlying object representation, because it is well known that neural dissimilarities are predicted well by search dissimilarities (Sripati & Olson, 2009; Zhivago et al, 2014).

      To thoroughly refute any lingering concern about circularity, we reasoned that the model predictions for Experiment 2 could have been obtained by a distance-to-center computation on any brain like object representation. To this end, we used object representations from deep neural networks pretrained on object categorization, whose representations are known to match well with the brain, and asked if a distance-to-centre computation on these representations could predict the search data in Experiment 2. This was indeed the case, and these results are now included an additional section in Supplementary Material (Section S1).

      (3) The negative correlation of target-absent response times with VH as it is defined for target-absent arrays, based on distance of a single stimulus from center, is uninterpretable without understanding the effects of center-fitting. Most likely, center-fitting and the different VH metric for target-absent trials produce an inverse correlation of VH with target-distractor similarity.

      Unfortunately, as we have mentioned above, target-distractor similarity cannot explain how target-absent searches behave, since there is no distractor in such searches.

      We do understand your broader concern about the center-fitting algorithm itself. We performed a number of additional analyses to confirm the generality of our results and reject alternate explanations – these are summarized in a new section titled “Confirming the generality of visual homogeneity” (p. 12), and the section is reproduced below for your convenience.   

      “Confirming the generality of visual homogeneity

      We performed several additional analyses to confirm the generality of our results, and to reject alternate explanations.

      First, it could be argued that our results are circular because they involve taking oddball search times from Experiment 1 and using them to explain search response times in Experiment 2. This is a superficial concern since we are using the search dissimilarities from Experiment 1 only as a proxy for the underlying neural representation, based on previous reports that neural dissimilarities closely match oddball search dissimilarities (Sripati and Olson, 2010; Zhivago and Arun, 2014). Nonetheless, to thoroughly refute this possibility, we reasoned that we would get similar predictions of the target present/absent responses in Experiment using any other brain-like object representation. To confirm this, we replaced the object representations derived from Experiment 1 with object representations derived from deep neural networks pretrained for object categorization, and asked if distance-to-center computations could predict the target present/absent responses in Experiment 2. This was indeed the case (Section S1). 

      Second, we wondered whether the nonlinear optimization process of finding the best-fitting center could be yielding disparate optimal centres each time. To investigate this, we repeated the optimization procedure with many randomly initialized starting points, and obtained the same best-fitting center each time (see Methods).

      Third, to confirm that the above model fits are not due to overfitting, we performed a leave-one-out cross validation analysis. We left out all target-present and target-absent searches involving a particular image, and then predicted these searches by calculating visual homogeneity estimated from all other images. This too yielded similar positive and negative correlations (r = 0.63, p < 0.0001 for target-present, r = -0.63, p < 0.001  for target-absent).

      Fourth, if heterogeneous displays indeed elicit similar neural responses due to mixing, then their average distance to other objects must be related to their visual homogeneity. We confirmed that this was indeed the case, suggesting that the average distance of an object from all other objects in visual search can predict visual homogeneity (Section S1).

      Fifth, the above results are based on taking the neural response to oddball arrays to be the average of the target and distractor responses. To confirm that averaging was indeed the optimal choice, we repeated the above analysis by assuming a range of relative weights between the target and distractor. The best correlation was obtained for almost equal weights in the lateral occipital (LO) region, consistent with averaging and its role in the underlying perceptual representation (Section S1).

      Finally, we performed several additional experiments on a larger set of natural objects as well as on silhouette shapes. In all cases, present/absent responses were explained using visual homogeneity (Section S2).”

      The construction of the VH perceptual space also involves fitting a "center" point such that distances to center predict response times as closely as possible. The effect of this fitting process on distance-to-center values for individual objects or clusters of objects is unknowable from what is presented here. These effects would depend on the residual errors after fitting response times with the connecting line distances. The center point location and its effects on distance-to-center of single objects and object clusters are not discussed or reported here.

      While it is true that the optimal center needs to be found by fitting to the data, there no particular mystery to the algorithm: we are simply performing a standard gradient-descent to maximize the fit to the data. We have described the algorithm clearly and are making our codes public. We find the algorithm to yield stable optimal centers despite many randomly initialized starting points. We find the optimal center to be able to predict responses to entirely novel images that were excluded during model training. We are making no assumption about the location of centre with respect to individual points. Therefore, we see no cause for concern regarding the center-finding algorithm. 

      Yet, this uninterpretable distance-to-center of single objects is chosen as the metric for VH of target-absent displays (VHabsent). This is justified by the idea that arrays of a single stimulus will produce an average response equal to one stimulus of the same kind. But it is not logically clear why response strength to a stimulus should be a metric for homogeneity of arrays constructed from that stimulus, or even what homogeneity could mean for a single stimulus from this set. And it is not clear how this VHabsent metric based on single stimuli can be equated to the connecting line VH metric for stimulus pairs, i.e. VHpresent, or how both could be plotted on a single continuum.

      Most visual tasks, such as finding an animal, are thought to involve building a decision boundary on some underlying neural representation. Even visual search has been portrayed as a signal-detection problem where a particular target is to be discriminated from a distractor. However none of these formulations work in the case of property-based visual tasks, where there is no unique feature to look for.

      We are proposing that, when we view a search array, the neural response to the search array can be deduced from the neural responses to the individual elements using well known rules, and that decisions about an oddball target being present or absent can be made by computing the distance of this neural response from some canonical mean firing rate of a population of neurons. This distance to center computation is what we denote as visual homogeneity. We have revised our manuscript throughout to make this clearer and we hope that this helps you understand the logic better. 

      It is clear, however, what *should* be correlated with difficulty and response time in the target-absent trials, and that is the complexity of the stimuli and the numerosity of similar distractors in the overall stimulus set. Complexity of the target, similarity with potential distractors, and number of such similar distractors all make ruling out distractor presence more difficult. The correlation seen in Fig. 2G must reflect these kinds of effects, with higher response times for complex animal shapes with lots of similar distractors and lower response times for simpler round shapes with fewer similar distractors.

      You are absolutely correct that the stimulus complexity should matter, but there are no good empirically derived measures for stimulus complexity, other than subjective ratings which are complex on their own and could be based on any number of other cognitive and semantic factors. But considering what factors are correlated with target-absent response times is entirely different from asking what decision variable or template is being used by participants to solve the task.

      The example points in Fig. 2G seem to bear this out, with higher response times for the deer stimulus (complex, many close distractors in the Fig. 2B perceptual space) and lower response times for the coffee cup (simple, few close distractors in the perceptual space). While the meaning of the VH scale in Fig. 2G, and its relationship to the scale in Fig. 2F, are unknown, it seems like the Fig. 2G scale has an inverse relationship to stimulus complexity, in contrast to the expected positive relationship for Fig. 2F. This is presumably what creates the observed negative correlation in Fig. 2G.

      Taken together, points 1-3 suggest that VHpresent and VHabsent are complex, unnecessary, and disconnected metrics for understanding target detection response times. The standard, simple explanation should stand. Task difficulty and response time in target detection tasks, in both present and absent trials, are positively correlated with target-distractor similarity.

      We strongly disagree. Your assessment seems to be based on only considering target-present searches, which are of course driven by target-distractor similarity. Your  argument is flawed because systematic variations in target-absent trials cannot be linked to any target-distractor similarity since there are no targets in the first place in such trials.

      We have shown that target-absent response times are in fact, independent of experimental context, which means that they index an image property that is independent of any reference target (Results, p. 15; Section S4). This property is what we define as visual homogeneity.

      I think my interpretations apply to Experiments 3 and 4 as well, although I find the analysis in Fig. 4 especially hard to understand. The VH space in this case is based on Experiment 3 oddball detection in a stimulus set that included both symmetric and asymmetric objects. But the response times for a very different task in Experiment 4, a symmetric/asymmetric judgment, are plotted against the axes derived from Experiment 3 (Fig. 4F and 4G). It is not clear to me why a measure based on oddball detection that requires no use of symmetry information should be predictive of within-stimulus symmetry detection response times. If it is, that requires a theoretical explanation not provided here.

      We were simply using an oddball detection task to construct the underlying object representation, on the basis of observations that search dissimilarities are strongly correlated with neural   dissimilarities. In Section S1, we show that similar results could have been obtained using other object representations such as deep networks, as long as the representation is brain-like.

      (4) Contrary to the VH theory, same/different tasks are unlikely to depend on a decision boundary in the middle of a similarity or homogeneity continuum.

      We have provided empirical proof for our claims, by showing that target-present response times in a visual search task are correlated with “different” responses in the same-different task, and that target-absent response times in the visual search task are correlated with “same” responses in the same-different task (Section S4).

      The authors interpret the inverse relationship of response times with VHpresent and VHabsent, described above, as evidence for their theory. They hypothesize, in Fig. 1G, that VHpresent and VHabsent occupy a single scale, with maximum VHpresent falling at the same point as minimum VHabsent. This is not borne out by their analysis, since the VHpresent and VHabsent value scales are mainly overlapping, not only in Experiments 1 and 2 but also in Experiments 3 and 4. The authors dismiss this problem by saying that their analyses are a first pass that will require future refinement. Instead, the failure to conform to this basic part of the theory should be a red flag calling for revision of the theory.

      Again, the opposite correlations between target present/absent search times with VH are the crucial empirical validation of our claims that a distance-to-center calculation explain how we perform these property-based tasks. The VH predictions do not fully explain the data. We have explicitly acknowledged this shortcoming, so we are hardly dismissing it as a problem. 

      The reason for this single scale is that the authors think of target detection as a boundary decision task, along a single scale, with a decision boundary somewhere in the middle, separating present and absent. This model makes sense for decision dimensions or spaces where there are two categories (right/left motion; cats vs. dogs), separated by an inherent boundary (equal left/right motion; training-defined cat/dog boundary). In these cases, there is less information near the boundary, leading to reduced speed/accuracy and producing a pattern like that shown in Fig. 1G.

      Finding an oddball, deciding if two items are same or different and symmetry tasks are disparate visual tasks that do not fit neatly into standard models of decision making. The key conceptual advance of our study is that we propose a plausible neural representation and decision variable that allow all three property-based visual tasks to be reconciled with standard models of decision making.

      This logic does not hold for target detection tasks. There is no inherent middle point boundary between target present and target absent. Instead, in both types of trial, maximum information is present when target and distractors are most dissimilar, and minimum information is present when target and distractors are most similar. The point of greatest similarity occurs at then limit of any metric for similarity. Correspondingly, there is no middle point dip in information that would produce greater difficulty and higher response times. Instead, task difficulty and response times increase monotonically with similarity between targets and distractors, for both target present and target absent decisions. Thus, in Figs. 2F and 2G, response times appear to be highest for animals, which share the largest numbers of closely similar distractors.        

      Your alternative explanation rests on vague factors like “maximum information” which cannot be quantified. By contrast we are proposing a concrete, falsifiable model for three property-based tasks – same/different, oddball present/absent and object symmetry. Any argument based solely on item similarity to explain visual search or symmetry responses cannot explain systematic variations observed for target-absent arrays and for symmetric objects, for the reasons explained earlier.

      DEFINITION OF AREA VH USING fMRI

      (1) The area VH boundaries from different experiments are nearly completely non-overlapping.

      In line with their theory that VH is a single continuum with a decision boundary somewhere in the middle, the authors use fMRI searchlight to find an area whose responses positively correlate with homogeneity, as calculated across all of their target present and target absent arrays. They report VH-correlated activity in regions anterior to LO. However, the VH defined by symmetry Experiments 3 and 4 (VHsymmetry) is substantially anterior to LO, while the VH defined by target detection Experiments 1 and 2 (VHdetection) is almost immediately adjacent to LO. Fig. S13 shows that VHsymmetry and VHdetection are nearly non-overlapping. This is a fundamental problem with the claim of discovering a new area that represents a new quantity that explains response times across multiple visual tasks. In addition, it is hard to understand why VHsymmetry does not show up in a straightforward subtraction between symmetric and asymmetric objects, which should show a clear difference in homogeneity.

      We respectfully disagree. The partial overlap between the VH regions identified in Experiments 1 & 2 can hardly be taken as evidence against the quantity VH itself, because there are several other obvious alternate explanations for this partial overlap, as summarized earlier as well. The VH region does show up in a straightforward subtraction  between symmetric and asymmetric objects (Section S7), so we are not sure what the Reviewer is referring to here.

      (2) It is hard to understand how neural responses can be correlated with both VHpresent and VHabsent.

      The main paper results for VHdetection are based on both target-present and target-absent trials, considered together. It is hard to interpret the observed correlations, since the VHpresent and VHabsent metrics are calculated in such different ways and have opposite correlations with target similarity, task difficulty, and response times (see above). It may be that one or the other dominates the observed correlations. It would be clarifying to analyze correlations for target-present and target-absent trials separately, to see if they are both positive and correlated with each other.

      Thanks for raising this point. We have now confirmed that the positive correlation between VH and neural response holds even when we do the analysis separately for target-present and -absent searches (correlation between neural response in VH region and visual homogeneity (n = 32, r = 0.66, p < 0.0005 for target-present searches & n = 32, r = 0.56, p < 0.005 for target-absent searches).

      (3) Definition of the boundaries and purpose of a new visual area in the brain requires circumspection, abundant and convergent evidence, and careful controls.

      Even if the VH metric, as defined and calculated by the authors here, is a meaningful quantity, it is a bold claim that a large cortical area just anterior to LO is devoted to calculating this metric as its major task. Vision involves much more than target detection and symmetry detection. Cortex anterior to LO is bound to perform a much wider range of visual functionalities. If the reported correlations can be clarified and supported, it would be more circumspect to treat them as one byproduct of unknown visual processing in cortex anterior to LO, rather than treating them as the defining purpose for a large area of visual cortex.

      We totally agree with you that reporting a new brain region would require careful interpretation and abundant and converging evidence. However, this requires many studies worth of work, and historically category-selective regions like the FFA have achieved consensus only after they were replicated and confirmed across many studies. We believe our proposal for the computation of a quantity like visual homogeneity is conceptually novel, and our study represents a first step that provides some converging evidence (through replicable results across different experiments) for such a region. We have reworked our manuscript to make this point clearer (Discussion, p 32).

      Reviewer #3 (Public Review):

      Summary:

      This study proposes visual homogeneity as a novel visual property that enables observers perform to several seemingly disparate visual tasks, such as finding an odd item, deciding if two items are same, or judging if an object is symmetric. In Exp 1, the reaction times on several objects were measured in human subjects. In Exp 2, visual homogeneity of each object was calculated based on the reaction time data. The visual homogeneity scores predicted reaction times. This value was also correlated with the BOLD signals in a specific region anterior to LO. Similar methods were used to analyze reaction time and fMRI data in a symmetry detection task. It is concluded that visual homogeneity is an important feature that enables observers to solve these two tasks.

      Thank you for your accurate and positive assessment.

      Strengths:

      (1) The writing is very clear. The presentation of the study is informative.

      (2) This study includes several behavioral and fMRI experiments. I appreciate the scientific rigor of the authors.

      We are grateful to you for your balanced assessment and constructive comments.

      Weaknesses:

      (1) My main concern with this paper is the way visual homogeneity is computed. On page 10, lines 188-192, it says: "we then asked if there is any point in this multidimensional representation such that distances from this point to the target-present and target-absent response vectors can accurately predict the target-present and target-absent response times with a positive and negative correlation respectively (see Methods)". This is also true for the symmetry detection task. If I understand correctly, the reference point in this perceptual space was found by deliberating satisfying the negative and positive correlations in response times. And then on page 10, lines 200-205, it shows that the positive and negative correlations actually exist. This logic is confusing. The positive and negative correlations emerge only because this method is optimized to do so. It seems more reasonable to identify the reference point of this perceptual space independently, without using the reaction time data. Otherwise, the inference process sounds circular. A simple way is to just use the mean point of all objects in Exp 1, without any optimization towards reaction time data.

      We disagree with you since the same logic applies to any curve-fitting procedure. When we fit data to a straight line, we are finding the slope and intercept that minimizes the error between the data and the straight line, but we would hardly consider the process circular when a good fit is achieved – in fact we take it as a confirmation that the data can be fit linearly. In the same vein, we would not have observed a good fit to the data, if there did not exist any good reference point relative to which the distances of the target-present and target-absent search arrays predicted these response times.

      In Section S2, we show that the visual homogeneity estimates for each object is strongly correlated with the average distance of each object to all other objects (r = 0.84, p<0.0005, Figure S1).

      We have performed several additional analyses to confirm the generality of our results and to reject alternate explanations (see Results, p. 12, Section titled “Confirming the generality of visual homogeneity”). In particular, to confirm that the results we obtained are not due to overfitting, we performed a cross-validation analysis, where we removed all searches involving a particular image and predicted these response times using visual homogeneity. This too revealed a significant model correlation confirming that our results are not due to overfitting.

      (2) Visual homogeneity (at least given the current from) is an unnecessary term. It is similar to distractor heterogeneity/distractor variability/distractor statics in literature. However, the authors attempt to claim it as a novel concept. The title is "visual homogeneity computations in the brain enable solving generic visual tasks". The last sentence of the abstract is "a NOVEL IMAGE PROPERTY, visual homogeneity, is encoded in a localized brain region, to solve generic visual tasks". In the significance, it is mentioned that "we show that these tasks can be solved using a simple property WE DEFINE as visual homogeneity". If the authors agree that visual homogeneity is not new, I suggest a complete rewrite of the title, abstract, significance, and introduction.

      We respectfully disagree that visual homogeneity is an unnecessary term. Please see our comments to Reviewer 1 above. Just like saliency and memorability can be measured empirically, we propose that visual homogeneity can be empirically measured as the reciprocal of the target-absent search time in a search task, or as the reciprocal of the “same” response time in a same-different task. Understanding how these three quantities interact will require measuring them empirically for an identical set of images, which is beyond the scope of this study but an interesting possibility for future work.

      (3) Also, "solving generic tasks" is another overstatement. The oddball search tasks, same-different tasks, and symmetric tasks are only a small subset of many visual tasks. Can this "quantitative model" solve motion direction judgment tasks, visual working memory tasks? Perhaps so, but at least this manuscript provides no such evidence. On line 291, it says "we have proposed that visual homogeneity can be used to solve any task that requires discriminating between homogeneous and heterogeneous displays". I think this is a good statement. A title that says "XXXX enable solving discrimination tasks with multi-component displays" is more acceptable. The phrase "generic tasks" is certainly an exaggeration.

      Thank you for your suggestion. We have now replaced the term “generic tasks” with the term property-based tasks, which we feel is more appropriate and reflect the fact that oddball search, same-different and symmetry tasks all involve looking for a specific image property.

      (4) If I understand it correctly, one of the key findings of this paper is "the response times for target-present searches were positively correlated with visual homogeneity. By contrast, the response times for target-absent searches were negatively correlated with visual homogeneity" (lines 204-207). I think the authors have already acknowledged that the positive correlation is not surprising at all because it reflects the classic target-distractor similarity effect. But the authors claim that the negative correlations in target-absent searches is the true novel finding.

      (5) I would like to make it clear that this negative correlation is not new either. The seminal paper by Duncan and Humphreys (1989) has clearly stated that "difficulty increases with increased similarity of targets to nontargets and decreased similarity between nontargets" (the sentence in their abstract). Here, "similarity between nontargets" is the same as the visual homogeneity defined here. Similar effects have been shown in Duncan (1989) and Nagy, Neriani, and Young (2005). See also the inconsistent results in Nagy & Thomas, 2003, Vicent, Baddeley, Troscianko & Gilchrist, 2009. More recently, Wei Ji Ma has systematically investigated the effects of heterogeneous distractors in visual search. I think the introduction part of Wei Ji Ma's paper (2020) provides a nice summary of this line of research. I am surprised that these references are not mentioned at all in this manuscript (except Duncan and Humphreys, 1989).

      You are right in noting that Duncan and Humphreys (1989) propose that searches are more difficult when nontargets are dissimilar. However, since our searches have identical distractors, the similarity between nontargets is always constant across target-absent searches, and therefore this cannot predict any systematic variation in target-absent search that is observed in our data. By contrast, our results explain both target-absent searches and target-present searches.

      Thank you for pointing us to previous work. These studies show that it is not just the average distractor similarity but the statistics of the distractor similarity that drive visual search. However these studies do not explain why target-absent searches should vary systematically. 

      (6) If the key contribution is the quantitative model, the study should be organized in a different way. Although the findings of positive and negative correlations are not novel, it is still good to propose new models to explain classic phenomena. I would like to mention the three studies by Wei Ji Ma (see below). In these studies, Bayesian observer models were established to account for trial-by-trial behavioral responses. These computational models can also account for the set-size effect, behavior in both localization and detection tasks. I see much more scientific rigor in their studies. Going back to the quantitative model in this paper, I am wondering whether the model can provide any qualitative prediction beyond the positive and negative correlations? Can the model make qualitative predictions that differ from those of Wei Ji's model? If not, can the authors show that the model can quantitatively better account for the data than existing Bayesian models? We should evaluate a model either qualitatively or quantitatively.

      Thank you for pointing us to prior work by Wei Ji Ma. These studies systematically examined visual search for a target among heterogeneous distractors using simple parametric stimuli and a Bayesian modeling framework. By contrast, our experiments involve searching for single oddball targets among multiple identical distractors, so it is not clear to us that the Wei Ji Ma models can be easily used to generate predictions about these searches used in our study. 

      We are not sure what you mean by offering quantitative predictions beyond positive and negative correlations. We have tried to explain systematic variation in target-present and target-absent response times using a model of how these decisions are being made. Our model explains a lot of systematic variation in the data for both types of decisions.

      (7) In my opinion, one of the advantages of this study is the fMRI dataset, which is valuable because previous studies did not collect fMRI data. The key contribution may be the novel brain region associated with display heterogeneity. If this is the case, I would suggest using a more parametric way to measure this region. For example, one can use Gabor stimuli and systematically manipulate the variations of multiple Gabor stimuli, the same logic also applies to motion direction. If this study uses static Gabor, random dot motion, object images that span from low-level to high-level visual stimuli, and consistently shows that the stimulus heterogeneity is encoded in one brain region, I would say this finding is valuable. But this sounds like another experiment. In other words, it is insufficient to claim a new brain region given the current form of the manuscript.

      We agree that parametric stimulus manipulations are important for studying early visual areas where stimulus dimensions are known (e.g. orientation, spatial frequency). Using parametric stimulus manipulations for more complex stimuli is fraught with issues because the underlying representation may not be encoding the dimensions being manipulated. This is the reason why we attempted to recover the underlying neural representation using dissimilarities measured using visual search, and then asked whether a decision making process operating on this underlying representation can explain how decisions are made. Therefore we disagree that parametric stimulus manipulations are the only way to obtain insight into such tasks.

      We have proposed a quantitative model that explains how decisions about target present and absent can be made through distance-to-center computations on an underlying object representation. We feel that the behavioural and the brain imaging results strongly point to a novel computation that is being performed in a localized region in the brain. These results represent an important first step in understanding how complex, property-based tasks are performed by the brain. We have revised our manuscript to make this point clearer.

      REFERENCES

      - Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433-458. doi: 10.1037/0033-295x.96.3.433

      - Duncan, J. (1989). Boundary conditions on parallel processing in human vision. Perception, 18(4), 457-469. doi: 10.1068/p180457

      - Nagy, A. L., Neriani, K. E., & Young, T. L. (2005). Effects of target and distractor heterogeneity on search for a color target. Vision Research, 45(14), 1885-1899. doi: 10.1016/j.visres.2005.01.007

      - Nagy, A. L., & Thomas, G. (2003). Distractor heterogeneity, attention, and color in visual search. Vision Research, 43(14), 1541-1552. doi: 10.1016/s0042-6989(03)00234-7

      - Vincent, B., Baddeley, R., Troscianko, T., & Gilchrist, I. (2009). Optimal feature integration in visual search. Journal of Vision, 9(5), 15-15. doi: 10.1167/9.5.15

      - Singh, A., Mihali, A., Chou, W. C., & Ma, W. J. (2023). A Computational Approach to Search in Visual Working Memory.

      - Mihali, A., & Ma, W. J. (2020). The psychophysics of visual search with heterogeneous distractors. BioRxiv, 2020-08.

      - Calder-Travis, J., & Ma, W. J. (2020). Explaining the effects of distractor statistics in visual search. Journal of Vision, 20(13), 11-11.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have not made substantive changes to address my major concerns. Instead, they have responded with arguments about why their original manuscript was good as written. I did not find these arguments persuasive. Given that, I've left my public review the same, since it still represents my opinions about the paper. Readers can judge which viewpoints are more persuasive.

      We respectfully disagree: we have tried our best to address your concerns with additional analysis wherever feasible, and by acknowledging any limitations.

      Reviewer #3 (Recommendations For The Authors):

      (1) As I mentioned above, please consider rewriting title, abstract, introduction, and significance. Please remove the word "visual homogeneity" and instead use distractor heterogeneity/distractor variability/distractor statistics as often used in literature.

      To clarify, visual homogeneity is NOT the same as distractor homogeneity. Visual homogeneity refers to a distance-to-center computation and represents an image-computable property that can vary systematically even when all distractors are identical. By contrast distractor heterogeneity varies only when distractors are different from each other.

      (2) Better to remove the phrase "generic tasks".

      Thanks for your suggestions. We now refer to these tasks as property-based tasks. 

      (3) Better to explicitly specify the predictions made by the quantitative model beyond positive and negative correlations.

      The predictions of the quantitative model are to explain systematic variation in the response times. We are not sure what else is there to predict in the response times.

      (4) If the quantitative model is the key contribution, better to highlight the details and algorithmic contribution of the model, and show the advantage of this model either qualitatively and quantitatively.

      Please see our responses above. Our quantitative model explains behavior and brain imaging data on three disparate tasks – the same/different, oddball visual search and symmetry tasks. 

      (5) If the new brain region is the key contribution, better to downplay the quantitative model.

      Please see our responses above. Our quantitative model explains behavior and brain imaging data on three disparate tasks – the same/different, oddball visual search and symmetry tasks.

    1. eLife Assessment

      This important study enhances our understanding of ephaptic interactions by utilizing earthworm recordings to refine a general model and use it to predict ephaptic influences across various synaptic configurations. The integration of experimental evidence, a robust mathematical framework and computer simulations convincingly demonstrates the effects of action potential propagation and collision properties on nearby membranes. The study will interest both computational neuroscientists and physiologists.

    2. Reviewer #1 (Public review):

      The authors explain that an action potential that reach an axon terminal emits a small electrical field as it "annihilates". This happens even though there is no gap junction, at chemical synapses. The generated electrical field is simulated to show that it can affect a nearby, disconnected target membrane by tens of microvolts for tenths of a microsecond. Longer effects are simulated for target locations a few microns away.

      To simulate action potentials (APs), the paper does not use the standard Hodgkin-Huxley formalism because it fails to explain AP collision. Instead it uses the Tasaki and Matsumoto (TM) model which is simplified to only models APs with three parameters and as a membrane transition between two states of resting versus excited. The authors expand the strictly binary, discrete TM method to a Relaxing Tasaki Model (RTM) that models the relaxation of the membrane potential after an AP. They find that the membrane leak can be neglected in determining AP propagation and that the capacitive currents dominate the process.

      The strength of the work is that authors identified an important interaction between neurons that is neglected by the standard models. A weakness of the proposed approach is the assumptions that it makes. For instance, the external medium is modeled as a homogeneous conductive medium, which may be further explored to properly account for biological processes. To the authors' credit, the external medium can be largely varying and could be left out from the general model, only to be modeled specific instances.

      The authors provide convincing evidence by performing experiments to record action potential propagation and collision properties and then developing a theoretical framework to simulate effect of their annihilation on nearby membranes. They provide both experimental evidence and rigorous mathematical and computer simulation findings to support their claims. The work has a potential of explaining significant electrical interaction between nerve centers that are connected via a large number of parallel fibers.

      Comments on revisions:

      The authors responded to all of my previous concerns and significantly improved the manuscript.

    3. Reviewer #2 (Public review):

      In this study, the authors measured extracellular electrical features of colliding APs travelling in different directions down an isolated earthworm axon. They then used these features to build a model of the potential ephaptic effects of AP annihilation, i.e. the electrical signals produced by colliding/annihilating APs that may influence neighbouring tissue. The model was then applied to some different hypothetical scenarios involving synaptic connections. In a revised version of the manuscript, it was also applied, with success, to published experimental data on the cerebellar basket cell-to-Purkinje cell pinceau connection. The conclusion is that an annihilating AP at a presynaptic terminal can emphatically influence the voltage of a postsynaptic cell (this is, presumably, the 'electrical coupling between neurons' of the title), and that the nature of this influence depends on the physical configuration of the synapse.

      As an experimental neuroscientist who has never used computational approaches, I am unable to comment on the rigour of the analytical approaches that form the bulk of this paper. The experimental approaches appear very well carried out, and the data showing equal conduction velocity of anti- and orthodromically propagating APs in every preparation is now convincing.

      The conclusions drawn from the synaptic modelling have been considerably strengthened by the new Figure 5. Here, the authors' model - including AP annihilation at a synaptic terminal - is used to predict the amplitude and direction of experimentally observed effects at the cerebellar basket cell-to-Purkinje cell synapse (Blot & Barbour 2014). One particular form of the model (RTM with tau=0.5ms and realistic non-excitability of the terminal) matches the experimental data extremely well. This is a much more convincing demonstration that the authors' model of ephaptic effects can quantitatively explain key features of experimental data pertaining to synaptic function. As such, the implications for the relevance of ephaptic coupling at different synaptic contacts may be widespread and important.

      However, it appears that all of the models in the new Fig5 involve annihilating APs, yet only one fits the data closely. A key question, which should be addressed if at all possible, is what happens to the predictive power of the best-fitting model in Fig5 if the annihilation, and only the annihilation, is removed? In other words, can the authors show that it is specifically the ephaptic effects of AP annihilation, rather than other ephaptic effects of, say AP waveform/amplitude/propagation, that explain the synaptic effects measured in Blot & Barbour (2014)? This would appear to be a necessary demonstration to fully support the claims of the title.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      The authors explain that an action potential that reaches an axon terminal emits a small electrical field as it ”annihilates”. This happens even though there is no gap junction, at chemical synapses. The generated electrical field is simulated to show that it can affect a nearby, disconnected target membrane by tens of microvolts for tenths of a microsecond. Longer effects are simulated for target locations a few microns away.

      To simulate action potentials (APs), the paper does not use the standard Hodgkin-Huxley formalism because it fails to explain AP collision. Instead, it uses the Tasaki and Matsumoto (TM) model which is simplified to only model APs with three parameters and as a membrane transition between two states of resting versus excited. The authors expand the strictly binary, discrete TM method to a Relaxing Tasaki Model (RTM) that models the relaxation of the membrane potential after an AP. They find that the membrane leak can be neglected in determining AP propagation and that the capacitive currents dominate the process.

      The strength of the work is that the authors identified an important interaction between neurons that is neglected by the standard models. A weakness of the proposed approach is the assumptions that it makes. For instance, the external medium is modeled as a homogeneous conductive medium, which may be further explored to properly account for biological processes.

      The authors provide convincing evidence by performing experiments to record action potential propagation and collision properties and then developing a theoretical framework to simulate the effect of their annihilation on nearby membranes. They provide both experimental evidence and rigorous mathematical and computer simulation findings to support their claims. The work has the potential of explaining significant electrical interaction between nerve centers that are connected via a large number of parallel fibers.

      We thank the reviewer for the distinct analysis of our work and the assessment that we ’identified an important interaction between neurons that is neglected by standard models’.

      Indeed, we modeled the external (extracellular) medium as homogeneous conductive medium and, compared to real biological systems, this is a simplification. Our intention is to keep our formal model as general as possible, however, it can be extended to account for specific properties. Accessory structures at axon terminals (such as the pinceau at Purkinje cells) most likely evolved to shape ephaptic coupling. In addition, the extracellular medium is neither homogeneous nor isotropic, and to fully mimic a particular neural connection this has to be implemented in a model as well. We agree and look forward to see how specific modification of the external medium in biological systems will affect ephaptic coupling. We hope to facilitate progress on this question by providing our source code for further exploration. Using the tools that have been developed by the BRIAN community one can generate or import arbitrary complex cell morphologies (e.g. NeuroML files). Our source code adds the TM- and RTM model, which allows exploring the direct impact of extracellular properties on target neurons.

      Reviewer 2 (Public Review):

      In this study, the authors measured extracellular electrical features of colliding APs travelling in different directions down an isolated earthworm axon. They then used these features to build a model of the potential ephaptic effects of AP annihilation, i.e. the electrical signals produced by colliding/annihilating APs that may influence neighbouring tissue. The model was then applied to some different hypothetical scenarios involving synaptic connections. The conclusion was that an annihilating AP at a presynaptic terminal can ephaptically influence the voltage of a postsynaptic cell (this is, presumably, the ’electrical coupling between neurons’ of the title), and that the nature of this influence depends on the physical configuration of the synapse.

      As an experimental neuroscientist who has never used computational approaches, I am unable to comment on the rigour of the analytical approaches that form the bulk of this paper. The experimental approaches appear very well carried out, and here I just have one query - an important assumption made is that the conduction velocity of anti- and orthodromically propagating APs is identical in every preparation, but this is never empirically/statistically demonstrated.

      My major concern is with the conclusions drawn from the synaptic modelling, which, disappointingly, is never benchmarked against any synaptic data. The authors state in their Introduction that a ’quantitative physical description’ of ephaptic coupling is ’missing’, however, they do not provide such a description in this manuscript. Instead, modelled predictions are presented of possible ephaptic interactions at different types of synapses, and these are then partially and qualitatively compared to previous published results in the Discussion. To support the authors’ assertion that AP annihilation induces electrical coupling between neurons, I think they need to show that their model of ephaptic effects can quantitatively explain key features of experimental data pertaining to synaptic function. Without this, the paper contains some useful high-precision quantitative measurements of axonal AP collisions, some (I assume) high-quality modelling of these collisions, and some interesting theoretical predictions pertaining to synaptic interactions, but it does not support the highly significant implications suggested for synaptic function.

      We thank the reviewer for highlighting the potential and the limitation of our model. We demonstrated with empirical data that measured conduction velocities of anti- and orthodromic propagating APs are indeed very similar and values are provided in Appendix 3 – table 1.

      In order to address how our model ’of ephaptic effects can quantitatively explain key features of experimental data’, we used the measured modulation of AP rates in Purkinje fibers by Blot and Babour (2014) and our results are now included in the manuscript. In our model, we implemented the ephaptic coupling of the Basket cell (with an annihilating AP) and predicted the modulation of AP rate in the Purkinje cell. Our model predictions are compared to the measured modulation of AP-rates in Purkinje cells and is added as Fig. 5 to the main manuscript (line 264 to 284 ). With this example, we show that ephaptic coupling as described with our RTM model can quantitatively describe key features of experimental data. Both, the rapid inhibition and the rebound activity is described by our model with implementation of non-excitable parts at the pinceau of the Basket cell. Future, experimental research can use the provided formalism to investigate in more detail the ephaptic coupling in systems like the Mauthner cell and the Purkinje cell by exploring how accessory structures and concomitant physical parameters, e.g. the extracellular properties impact ephaptic coupling.

      Reviewer 3 (Public Review):

      This manuscript aims to exploit experimental measurements of the extracellular voltages produced by colliding action potentials to adjust a simplified model of action potential propagation that is then used to predict the extracellular fields at axon terminals. The overall rationale is that when solving the cable equation (which forms the substrate for models of action potential propagation in axons), the solution for a cable with a closed end can be obtained by a technique of superposition: a spatially reflected solution is added to that for an infinite cable and this ensures by symmetry that no axial current flows at the closed boundary. By this method, the authors calculate the expected extracellular fields for axon terminals in different situations. These fields are of potential interest because, according to the authors, their magnitude can be larger than that of a propagating action potential and may be involved in ephaptic signalling. The authors perform direct measurements of colliding action potentials, in the earthworm giant axon, to parameterise and test their model.

      Although simplified models can be useful and the trick of exploiting the collision condition is interesting, I believe there are several significant problems with the rationale, presentation, and application, such that the validity and potential utility of the approach is not established.

      Simplified model vs. Hogdkin and Huxley

      The authors employ a simplified model that incorporates a two-state membrane (in essence resting and excited states) and adds a recovery mechanism. This generates a propagating wave of excitation and key observables such as propagation speed and action potential width (in space) can be adjusted using a small number of parameters. However, even if a Hodgkin-Huxley model does contain a much larger number of parameters that may be less easy to adjust directly, the basic formalism is known to be accurate and typical modifications of the kinetic parameters are very well understood, even if no direct characterisations already exist or cannot be obtained. I am therefore unconvinced by the utility of abandoning the HodgkinHuxley version.

      In several places in the manuscript, the simplified model fits the data well whereas the Hodgkin-Huxley model deviates strongly (e.g. Fig. 3CD). This is unsatisfying because it seems unlikely that the phenomenon could not be modelled accurately using the HH formulation. If the authors really wish to assert that it is ”not suitable to predict the effects caused by AP [collision]” (p9) they need to provide a good deal more analysis to establish the mechanism of failure.

      We are not as convinced as the reviewer that, at the current state of parameter estimation, the HH model is suited for predicting ephaptic coupling after ’adjusting’ parameters. There are strong arguments against such an approach. A major function of a model is to make testable predictions rather than to just mimic a biological phenomenon. The predictive power of a model heavily depends on how reasonable model parameters can be estimated or measured. As the reviewer correctly points out in the specific comments (”... the parameters adjusted to fit the model are the membrane capacitance and intracellular resistance. These have a physical reality and could easily be measured or estimated quite accurately...”), our model contains only parameters that can be assessed experimentally, thus it has a better predictive power compared to the HH model with a multitude of parameters for which ”no direct characterisations already exist or cannot be obtained” (citing reviewer from above).

      Already the founders of the HH model were well aware of the limitations, as stated by Hodgkin and Huxley in 1952 (J Physiol 117:500–544):

      An equally satisfactory description of the voltage clamp data could no doubt have been achieved with equations of very different form ... The success of the equations is no evidence in favour of the mechanism of permeability change that we tentatively had in mind when formulating them.

      A catchy but sloppy description for the problem of overfitting with too many parameters is given by the quote of John von Neumann: With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.

      We do not rule out the possibility that the HH model eventually can be used to predict ephaptic coupling. However, at the moment, parameter estimation for the HH model prevents its usability for predicting ephaptic coupling.

      (In)applicability of the superposition principle

      The reflecting boundary at the terminal is implemented using the symmetry of the collision of action potentials. However, at a closed cable there is no reflecting boundary in the extracellular space and this implied assumption is particularly inappropriate where the extracellular field is one objective of the modelling, as here. I believe this assumption is not problematic for the calculation of the intracellular voltage, because extracellular voltage gradients can usually be neglected1, but the authors need to explain how the issue was dealt with for the calculation of the extracellular fields of terminals. I assume they were calculated from the membrane currents of one-half of the collision solution, but this does not seem to be explained. It might be worth showing a spatial profile of the calculated field.

      We disagree with the reviewer’s statement ’...at a closed cable there is no reflecting boundary in the extracellular space and this implied assumption is particularly inappropriate...’. We do not imply this assumption in our model! We do not assume any symmetry or boundary condition in the extracellular space. Instead, the extracellular field is calculated for an infinite homogeneous volume conductor (Eq.

      6).

      We conduct separate calculations for (1) source membrane current, (2) resulting extracellular field, and (3) impact upon a target neuron. The boundary condition used for our calculations only refers to the axial current being zero at the axon terminal. Consequently all the internal current that enters the last compartment must leave the last compartment as membrane current and contributes to the extracellular current and field.

      The extracellular field around the axon terminal is not symmetric, as can be seen by it’s impact upon a target in Figure 4—figure supplement 1 which is also not symmetric. The symmetry of the extracellular field when APs are colliding (Cf. symmetry in Fig 1C) is merly the result of the symmetric stimulation and counterpropagation of two APs. We now are describing more specifically the bounday condition for colliding and terminating APs already in the introduction: ’A suitable boundary condition (intracellular, axial current equals zero) can be generated experimentally by a collision of two counter-propagating APs ... Within any cable model, the very same boundary condition also exists within the axon at the synaptic terminal due to the broken translation symmetry for the current loops ...’ Later, at the result section (Discharge of colliding APs), we continue with ’AP propagation is blocked when the axial current is shut down at a boundary condition, e.g. by reaching the axon terminal or by AP collision....’ and implement this condition in our calculations for the axon terminals.

      Missing demonstrations

      Central analytical results are stated rather brusquely, notably equations (3) and (4) and the relation between them. These merit an expanded explanation at the least. A better explanation of the need for the collision measurements in parameterising the models should also be provided.

      We thank the reviewer for pointing out the insufficient explanation of the equations 3 and 4. We rephrased the paragraph ’Discharge of colliding APs’ in order to clarify the origin and the function of the two equations (eq. 3: how much charge is expelled and eq. 4: the resulting extracellular potential that is used for model validation).

      Later, in the Discussion, we rephrased the paragraph where we describe the annihilation process and explain further that one term of eq. 4 sometimes is refered to ’activating function’ when using microelectrodes for stimulation.

      With respect to the ’explanation of the need for the collision measurement’, we think that the explanations we give at several locations in the manuscript are sufficient as is. We explain and elaborate in the introduction: ’We explore the behaviour of APs at boundaries ... In this study, we first focus on collisions of APs. Our experimental observation of colliding APs provides unique access to the spatial profile of the extracellular potential around APs that are blocked by collisions and thus annihilate..... Recording propagating APs allows to determine both the propagation velocity and the amplitude of the extracellular electric potentials. The collision experiment provides additional information ... In the results we recall: ’The width of the collision is a measure of the characteristic length λ⋆ of the AP and is uniquely revealed by a collision sweep experiment.’

      Adjusted parameters

      I am uncomfortable that the parameters adjusted to fit the model are the membrane capacitance and intracellular resistance. These have a physical reality and could easily be measured or estimated quite accurately. With a variation of more than 20-fold reported between the different models in Appendix 2 we can be sure that some of the models are based upon quite unrealistic physical assumptions, which in turn undermines confidence in their generality.

      The fact that the parameters of our model have physical realities is clearly in favor of our models. We rephrased the legend of the table, now explaining the procedure for the model fitting and the rational behind. Although the values of g⋆ can differ by a factor of 15 and the resulting amplitude is very different, the relationship ri cm \= vpλ⋆ is very similar, independently of the model used and this confirms our analytical framework.

      p8 - the values of both the extracellular (100 Ohm m) and intracellular resistivity (1 Ohm m) appear to be in error, especially the former.

      We have the following justification for the resistivity values we used. For the intracellular resistivity, literature values range from 0.4 - 1.5 Ohm m, and therefore we selected 1 Ohm m. See: Carpenter et al (1975) doi: 10.1085/jgp.66.2.139; Cole et al (1975) doi: 10.1085/jgp.66.2.133; Bekkers (2014) doi: 10.1007/978-1-46147320-6 35-2.

      Estimating extracellular resistivity is less straight forward, since it depends crucially on the structure around the synapse which consists of conducting saline and insulating fatty tissue. Ranges from 3 to 600 Ohm m are reported (Linden et al (2011) doi: 10.1016/j.neuron.2011.11.006) and Bakiri et al (2011) doi: 10.1113/jphysiol.2010.201376). Weiss et al (2008; doi: 10.1073/pnas.0806145105) report extracellular resistivities in the Mauthner Cap between 50-600 Ohm m in SI. Since the pinceau is structurally similar to the Mauthner cells axon cap, we argue that a value of 100 Ohm m is a reasonable choice for our calculations. Additionally, we derived a value from Blot and Barbour (doi:c10.1038/nn.3624), rephrased the paragraph in the main text and added our calculation to the supplementary material (Appendix 1).

      (In)applicability to axon terminals

      The rationale of the application of the collision formalism to axon terminals is somewhat undermined by the fact that they tend not to be excitable. There is experimental evidence for this in the Calyx of Held and the cerebellar pinceau.

      The solution found via collision is therefore not directly applicable in these cases.

      We do not agree with the reviewer’s statement that ’the solution found via collision is (therefore) not directly applicable...’. Our model is well suited for application on axon terminals that are not excitable, e.g. the pinceau of the basket cell, as the reviewer points out. We have included a calculation for this case and present the results in the new Fig. 5 (main text line 264 to 284 ).

      Comparison with experimental data

      More effort should be made to compare the modelling with the extracellular terminal fields that have been reported in the literature.

      As outlined above (see: Reponse to reviewer 2), we now compare directly the predictions of our models with measured modulation of AP rates in Purkinje fibers (Blot and Babour 2014) and our results are included in the manuscript (Fig. 5 and main text line 264 to 284). See also our response to reviewer 2 in which we address how our model ’of ephaptic effects can quantitatively explain key features of experimental data’.

      Choice of term ”annihilation”

      The term annihilation does not seem wholly appropriate to me. The dictionary definitions are something along the lines of complete destruction by an external force or mutual destruction, for example of an electron and a positron. I don’t think either applies exactly here. I suggest retaining the notion of collision which is well understood in this context.

      Experimentally, we generated a collision of APs and showed that colliding APs dissapear and do not pass each other. For this process the term annihilation is used in our and in other studies (see e.g. Berg et al (2017) doi: 10.1103/PhysRevX.7.028001; Johnson et al (2018) doi: 10.3389/fphys.2018.00779; Follmann (2015) doi: 10.1103/PhysRevE.92.032707; Shrivastava et al (2018) doi: 10.1098/rsif.2017.0803). The physical processes involved in the termination of an AP at a closed end are essentially identical to those of two colliding APs. This we think justifies using the term annihilation for those processes.

      Recommendations for the authors:

      We believe the work is of high quality and should motivate future experimental work. We are including the review comments here for your information. The main piece of feedback we are offering is that the broad claims need to be adjusted to the strength of evidence provided: as is, the manuscript provides compelling predictions but the claim that these predictions are in full agreement with data remains to be substantiated. A technical concern raised by the reviewers is that the reflecting boundary condition may need further justification. The authors may wish to respond to this issue in a rebuttal and/or adjust the manuscript as necessary.

      We substantiated our claim that our predictions are in full agreement with experimental data. We added to the manuscript a section in which we compare our models’ predictions to published, experimental data. To this aim, we extracted date from the publication of Blot and Babour (2014), we elaborated on the parameters used and run our model accordingly. We added to the Results/Model of ephaptic coupling a paragraph on ’The modulation of activity in Purkinje cells...’ (line 264), where we describe our results and we also included another figure to the main text for illustration (Fig. 5).

      We clarified the term ’boundary condition’ by rephrasing parts of the introduction and we explain the rational behind in ’Discharge of colliding APs (...AP propagation is blocked when axial current is shut down...) and in ’Model of ephaptic coupling (Within any cable model, the same boundary...). See also our response to the general comments of reviewer 3 above.

      Reviewer 1 (Recommendations For The Authors):

      Major:

      Accessing data and code requires signing in, which should not be required. The link provided also seems to be not accessible yet - could be pending review.

      The repository is now publicly availible. We did provide an access code within the letter to the editor, this code is no longer required.

      Line 74: how about morphology? Authors should clarify and emphasize in the introduction that the TM model is a spatially continuous model with partial differential equations as opposed to discrete morphological models to simulate HH equations.

      The reviewer is correct that the TM model is continous. However, so is the HH model. The difference between HH and TM is only that the TM model can be solved analytically, which yields a spatially homogeneous analytical solution. It should be noted that this analytical solution can only be valid for a homogeneous (therefore infinite) nerve. Every numerical computation, be it HH or TM, requires a finite number of discrete compartments. In our calculations, we used identical compartment models for HH, TM and RTM model. In each compartment, the differential equations are solved numerically. Since there is no fundamental difference between these models, we obstain from changing the text.

      Minor:

      Major typo: ventral nerve cord, not ”chord”. Repeated in several places.

      Thank you for indicating this typo to us.

      Line 25: inhibition, excitation, and modulation?

      We changed the line to: ... leads to modulation, e.g. excitation or inhibition

      Line 70: better term for ”length” of AP would be ”duration”. Also, the sentence could be simplified to use either ”its” or ”of the AP”

      Space and time are not interchangable. Thus, the term lenght can not be replaced by duration. We simplified the structure of the sentence as suggested.

      Fig 1A/B: it’s strange that panel B precedes panel A.

      Exchanged

      Fig 1C: don’t see the ”horizontal line”; also regarding ”The recording was at a medial position”, the caption is not clear until one reads the main text.

      We changed the legend to: ... The collision is captured in the recording line at y-position 0 mm, while orthodromic propagation is at the top and antidromic propagation is at the bottom. (D) The peak amplitude as a function of the distance to the collision. Examples of four sweeps at three positions along the nerve cord....

      Line 127: the per distance measures could be named as ”specific” conductivity, etc.

      We explicitly provide the units thereby defining the quantities unambigously.

      Line 176: typo ”ad-hoc”.

      Thank you.

      Fig 4B: should clarify that the circle in the schematic is not the soma but a synaptic bouton.

      We rephrased to ’...(B,C) when the AP is annihilating at a bouton of a neuron terminal (upper neuron in end-to-shaft geometry, similar to the Basket cell–Purkinje cell synapse)...’, and we added a label to Fig 4B.

      Reviewer 2 (Recommendations For The Authors):

      Can the authors’ model be quantitatively compared with experimental data of ephaptic interactions at synapses (e.g. the Blot & Barbour study described in the Discussion)?

      We did so as outlined in our response to the reviewer above.

      Can statistical evidence be provided that the velocities of anti- and orthodromic APs are indeed identical in the earthworm nerve recordings?

      These data and statistics are available in Appendix 2, now 3 – table 1

      Why not reorder ABCD in Fig1 so the subpanels run from left to right?

      We adjusted the labels accordingly.

    1. eLife Assessment

      This paper represents a "classic" approach towards evaluating a novel taste stimulus in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology and immunocytochemistry of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi", which enhance other canonical tastes, increasing their hedonic attributes; the mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model. This work is valuable but incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi", which are stimuli that enhance other canonical tastes, increasing essentially the hedonic attributes of these other stimuli; the mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model.

      Strengths:

      The data show the effects of ornithine on taste: in two-bottle and briefer intake tests, adding ornithine results in a higher intake of most, but not all, stimuli tests. Bilateral nerve cuts or the addition of GPRC6A antagonists decrease this effect. Small effects of ornithine are shown in whole-nerve recordings.

      Weaknesses:

      The conclusion seems to be that the authors have found evidence for ornithine acting as a taste modifier through the GPRC6A receptor expressed on the anterior tongue. It is hard to separate their conclusions from the possibility that any effects are additive rather than modulatory. Animals did prefer ornithine to water when presented by itself. Additionally, the authors refer to evidence that ornithine is activating the T1R1-T1R3 amino acid taste receptor, possibly at higher concentrations than they use for most of the study, although this seems speculative. It is striking that the largest effects on taste are found with the other amino acid (umami) stimuli, leading to the possibility that these are largely synergistic effects taking place at the tas1r receptor heterodimer.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors examined a new and exciting taste enhancer (ornithine). They used a variety of experimental approaches in rats to document the impact of ornithine on taste preference and peripheral taste nerve recordings. Further, they provided evidence pointing to a potential receptor for ornithine.

      Weaknesses:

      The authors have not established that the rat is an appropriate model system for studying kokumi. Their measurements do not provide insight into any of the established effects of kokumi on human flavor perception. The small study on humans is difficult to compare to the rat study because the authors made completely different types of measurements. Thus, I think that the authors need to substantially scale back the scope of their interpretations. These weaknesses diminish the likely impact of the work on the field of flavor perception.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein-coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste.

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants, including inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl); citric acid and quinine hydrochloride. Robust effects of ornithine were observed in the cases of IMP, MSG, MPG, and sucrose, and little or no effects were observed in the cases of sodium chloride, citric acid, and quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. The inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify the role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally, they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      Weaknesses:

      The researchers undertook what turned out to be largely confirmatory studies in rats with respect to their previously published work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9). They miss an opportunity to outline the experimental results from the study that favor their preferred interpretation that ornithine is a taste enhancer rather than a tastant.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). While the experimental results as a whole favor the authors' interpretation that C6A mediates the Ornithine responses, they do not make clear either the nature of the 'receptor identification problem' in the Introduction or the way in which they approached that problem in the Results and Discussion sections. It would be helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response. In addition, while they showed that C6A-positive cells were clearly distinct from gustducin-positive, and thus T1R-positive cells, they missed an opportunity to clearly differentiate C6A-expressing taste cells and CaSR-expressing taste cells in the rat tongue sections.

      It would have been helpful to include a positive control kokumi substance in the two-bottle preference experiment (e.g., one of the known gamma-glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      The results demonstrate that enhancement of the chorda tympani nerve response to MSG occurs at substantially greater Ornithine concentrations (10 and 30 mM) than were required to observe differences in the two bottle preference experiments (1.0 mM; Figure 2). The discrepancy requires careful discussion and if necessary further experiments using the two-bottle preference format.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi", which are stimuli that enhance other canonical tastes, increasing essentially the hedonic attributes of these other stimuli; the mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model.

      Strengths:

      The data show the effects of ornithine on taste: in two-bottle and briefer intake tests, adding ornithine results in a higher intake of most, but not all, stimuli tests. Bilateral nerve cuts or the addition of GPRC6A antagonists decrease this effect. Small effects of ornithine are shown in whole-nerve recordings.

      Weaknesses:

      The conclusion seems to be that the authors have found evidence for ornithine acting as a taste modifier through the GPRC6A receptor expressed on the anterior tongue. It is hard to separate their conclusions from the possibility that any effects are additive rather than modulatory. Animals did prefer ornithine to water when presented by itself. Additionally, the authors refer to evidence that ornithine is activating the T1R1-T1R3 amino acid taste receptor, possibly at higher concentrations than they use for most of the study, although this seems speculative. It is striking that the largest effects on taste are found with the other amino acid (umami) stimuli, leading to the possibility that these are largely synergistic effects taking place at the tas1r receptor heterodimer.

      We would like to thank Reviewer #1 for the valuable comments. Our basis for considering ornithine as a taste modifier stems from our observation that a low concentration of ornithine (1 mM), which does not elicit a preference on its own, enhances the preference for umami substances, sucrose, and soybean oil through the activation of the GPRC6A receptor. Notably, this receptor is not typically considered a taste receptor. The reviewer suggested that the enhancement of umami taste might be due to potentiation occurring at the TAS1R receptor heterodimer. However, we propose that a different mechanism may be at play, as an antagonist of GPRC6A almost completely abolished this enhancement. In the revised manuscript, we will endeavor to provide additional information on the role of ornithine as a taste modifier acting through the GPRC6A receptor.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors examined a new and exciting taste enhancer (ornithine). They used a variety of experimental approaches in rats to document the impact of ornithine on taste preference and peripheral taste nerve recordings. Further, they provided evidence pointing to a potential receptor for ornithine.

      Weaknesses:

      The authors have not established that the rat is an appropriate model system for studying kokumi. Their measurements do not provide insight into any of the established effects of kokumi on human flavor perception. The small study on humans is difficult to compare to the rat study because the authors made completely different types of measurements. Thus, I think that the authors need to substantially scale back the scope of their interpretations. These weaknesses diminish the likely impact of the work on the field of flavor perception.

      We would like to thank Reviewer #2 for the valuable comments and suggestions. Regarding the question of whether the rat is an appropriate model system for studying kokumi, we have chosen this species for several reasons: it is readily available as a conventional experimental model for gustatory research; the calcium-sensing receptor (CaSR), known as the kokumi receptor, is expressed in taste bud cells; and prior research has demonstrated the use of rats in kokumi studies involving gamma Glu-Val-Gly (Yamamoto and Mizuta, Chem. Senses, 2022). We acknowledge that fundamentally different types of measurements were conducted in the human psychophysical study and the rat study. Kokumi can indeed be assessed and expressed in humans; however, we do not currently have the means to confirm that animals experience kokumi in the same way that humans do. Therefore, human studies are necessary to evaluate kokumi, a conceptual term denoting enhanced flavor, while animal studies are needed to explore the potential underlying mechanisms of kokumi. We believe that a combination of both human and animal studies is essential, as is the case with research on sugars. While sugars are known to elicit sweetness, it is unclear whether animals perceive sweetness identically to humans, even though they exhibit a strong preference for sugars. In the revised manuscript, we will incorporate additional information to address the comments raised by the reviewer. We will also carefully review and revise our previous statements to ensure accuracy and clarity.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein-coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste.

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants, including inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl); citric acid and quinine hydrochloride. Robust effects of ornithine were observed in the cases of IMP, MSG, MPG, and sucrose, and little or no effects were observed in the cases of sodium chloride, citric acid, and quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. The inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify the role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally, they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      Weaknesses:

      The researchers undertook what turned out to be largely confirmatory studies in rats with respect to their previously published work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9). They miss an opportunity to outline the experimental results from the study that favor their preferred interpretation that ornithine is a taste enhancer rather than a tastant.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). While the experimental results as a whole favor the authors' interpretation that C6A mediates the Ornithine responses, they do not make clear either the nature of the 'receptor identification problem' in the Introduction or the way in which they approached that problem in the Results and Discussion sections. It would be helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response. In addition, while they showed that C6A-positive cells were clearly distinct from gustducin-positive, and thus T1R-positive cells, they missed an opportunity to clearly differentiate C6A-expressing taste cells and CaSR-expressing taste cells in the rat tongue sections.

      It would have been helpful to include a positive control kokumi substance in the two-bottle preference experiment (e.g., one of the known gamma-glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      The results demonstrate that enhancement of the chorda tympani nerve response to MSG occurs at substantially greater Ornithine concentrations (10 and 30 mM) than were required to observe differences in the two bottle preference experiments (1.0 mM; Figure 2). The discrepancy requires careful discussion and if necessary further experiments using the two-bottle preference format.

      We would like to thank Reviewer #3 for the valuable comments and helpful suggestions. We propose that ornithine has two stimulatory actions: one acting on GPRC6A, particularly at lower concentrations, and another on amino acid receptors such as T1R1/T1R3 at higher concentrations. Consequently, ornithine is not preferable at lower concentrations but becomes preferable at higher concentrations. For our study on kokumi, we used a low concentration (1 mM) of ornithine. The possibility mentioned in the Discussion that 'the umami substances may enhance the taste response to ornithine' is entirely speculative. We will reconsider including this description in the revised version. As the reviewer suggested, in addition to GPRC6A, ornithine may bind to CaSR and/or T1R1/T1R3 heterodimers. However, we believe that ornithine mainly binds to GPRC6A, as a specific inhibitor of this receptor almost completely abolished the enhanced response to umami substances, and our immunohistochemical study indicated that GPRC6A-expressing taste cells are distinct from CaSR-expressing taste cells (see Supplemental Fig. 3). We conducted essentially the same experiments using gamma-Glu-Val-Gly in Wistar rats (Yamamoto and Mizuta, Chem. Senses, 2022) and compared the results in the Discussion. The reviewer may have misunderstood the chorda tympani results: we added the same concentration (1 mM) used in the two-bottle preference test to MSG (Fig. 5-B). Fig. 5-A shows nerve responses to five concentrations of plain ornithine. In the revised manuscript, we will strive to provide more precise information reflecting the reviewer’s comments.

    1. eLife Assessment

      This study proposes an important new approach to analyzing cell-count data that are often undersampled and cannot be correctly assessed with traditional statistical analyses. The presented case studies provide convincing evidence of the superiority of the proposed methodology to existing approaches, which could promote the use of Bayesian statistics among neuroscientists. However, the generalizability of the methodology to other data types is not fully evidenced.

    2. Reviewer #1 (Public review):

      Summary:

      This work proposes a new approach to analyse cell-count data from multiple brain regions. Collecting such data can be expensive and time-intensive, so, more often than not, the dimensionality of the data is larger than the number of samples. The authors argue that Bayesian methods are much better suited to correctly analyse such data compared to classical (frequentist) statistical methods. They define a hierarchical structure, partial pooling, in which each observation contributes to the population estimate to more accurately explain the variance in the data. They present two case studies in which their method proves more sensitive in identifying regions where there are significant differences between conditions, which otherwise would be hidden.

      Strengths:

      The model is presented clearly, and the advantages of the hierarchical structure are strongly justified. Two alternative ways are presented to account for the presence of zero counts. The first involves the use of a horseshoe prior, which is the more flexible option, while the second involves a modified Poisson likelihood, which is better suited to datasets with a large number of zero counts, perhaps due to experimental artifacts. The results show a clear advantage of the Bayesian method for both case studies.

      The code is freely available, and it does not require a high-performance cluster to execute for smaller datasets. As Bayesian statistical methods become more accessible in various scientific fields, the whole scientific community will benefit from the transition away from p-values. Hierarchical Bayesian models are an especially useful tool that can be applied to many different experimental designs. However, while conceptually intuitive, their implementation can be difficult. The authors provide a good framework with room for improvement.

      Weaknesses:

      Alternative possibilities are discussed regarding the prior and likelihood of the model. Given that the second case study inspired the introduction of the zero-inflation likelihood, it is not clear how applicable the general methodology is to various datasets. If every unique dataset requires a tailored prior or likelihood to produce the best results, the methodology will not easily replace more traditional statistical analyses that can be applied in a straightforward manner. Furthermore, the differences between the results produced by the two Bayesian models in case study 2 are not discussed. In specific regions, the models provide conflicting results (e.g., regions MH, VPMpc, RCH, SCH, etc.), which are not addressed by the authors. A third case study would have provided further evidence for the generalizability of the methodology.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-written methodology paper applying a Bayesian framework to the statistics of cell counts in brain slices. A sharpening of the bounds on measured quantities is demonstrated over existing frequentist methods and therefore the work is a contribution to the field.

      Strengths:

      As well as a mathematical description of the approach, the code used is provided in a linked repository.

      Weaknesses:

      A clearer link between the experimental data and model-structure terminology would be a benefit to the non-expert reader.

    4. Author response:

      We thank both reviewers for their considerate reviews. In this provisional response we would like to make a few key points.

      Given that we introduced a bespoke likelihood model for the second dataset, Reviewer 1 asks whether "every unique dataset requires a tailored prior or likelihood to produce the best results". Our intention is to advocate for the horseshoe prior model as a 'standard' first analysis for any cell count dataset. If extra knowledge about the data is available, or if any data artefacts are detected, more elaborate likelihoods could be introduced as needed in a follow-up analysis. Our introduction of the zero-inflated Poisson likelihood for the second dataset was one such example, but many alternatives could exist. This iterative approach to model building, sometimes referred to as a `Bayesian workflow' is seen as good practise in Bayesian data analysis literature. In the revised version of the paper, we will try to explain the recommendations and modelling philosophy behind this method while emphasising that tailoring or bespoke modelling is not required for our `standard analysis', what we would regard as the Bayesian replacement for a t-test on counts.

      Reviewer 1 notes that "the differences between the results produced by the two Bayesian models in case study 2 are not discussed". We agree that this discrepancy, arising from the specific assumptions of each model is an interesting issue which we should better explore in the paper. In Figure 6 we plotted the actual data values alongside posterior and confidence intervals to explain how the results from the ZIP likelihood and Horseshoe prior compare with those from a t-test. However, our example regions did not highlight cases where differences could be noted between the the two Bayesian models. In the revised version of the paper, we will extend Figure 6 to include further brain regions, such as those mentioned by the referee, and will use that as an opportunity to discuss the broader issue of what to do when the Bayesian models give conflicting results.

      We agree with reviewer 2's point that the model description terminology could be made clearer for the target eLife audience. We tried to strike a balance between introducing the reader to the conventional technical terminology used in the Bayesian data analysis necessary for understanding the model while avoiding exhaustive statistical terminology. We erred too much on the side of the latter instead of providing clear links between the model construction and experimental data. In the revised version of the paper, we will augment any technical terms with more biological language and provide a Glossary for reader reference.

    1. eLife Assessment

      This study utilizes an elegant approach to examine valence encoding of the mesolimbic dopamine system. The findings are valuable, demonstrating differential responses of dopamine to the same taste stimulus according to its valence (i.e., appetitive or aversive) and in alignment with distinct behavioral responses. The evidence supporting the claims is convincing, resulting from a well-controlled experimental design with minimal confounds and thorough reporting of the data.

    2. Reviewer #1 (Public review):

      Summary:

      Loh and colleagues investigate valence encoding in the mesolimbic dopamine system. Using an elegant approach, they show that sucrose, which normally evokes strong dopamine neuron activity and release in the nucleus accumbens, is made aversive via conditioned taste aversion, the same sucrose stimulus later evokes much less dopamine neuron activity and release. Thus, dopamine activity can dynamically track the changing valence of an unconditioned stimulus. These results are important for helping clarify valence and value related questions that are the matter of ongoing debate regarding dopamine functions in the field.

      Strengths:

      This is an elegant way to ask this question, the within subject's design and the continuity of the stimulus is a strong way to remove a lot of the common confounds that make it difficult to interpret valence-related questions. I think these are valuable studies that help tie up questions in the field while also setting up a number of interesting future directions. There are number of control experiments and tweaks to the design that help eliminate a number of competing hypotheses regarding the results. The data are clearly presented and contextualized.

      Weaknesses for consideration:

      The focus on one relatively understudied region of the rat striatum for dopamine recordings could potentially limit generalization of the findings. While this can be determined in future studies, the implications should be further discussed in the current manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      Koh et al. report an interesting manuscript studying dopamine binding in the lateral accumbens shell of rats across the course of conditioned taste aversion. The question being asked here is how does the dopamine system respond to aversion? The authors take advantage of unique properties of taste aversion learning (notably, within-subjects remapping of valence to the same physical stimulus) to address this.

      They combine a well controlled behavioural design (including key, unpaired controls) with fibre photometry of dopamine binding via GrabDA and of dopamine neuron activity by gCaMP, careful analyses of behaviour (e.g., head movements; home cage ingestion), the authors show that, 1) conditioned taste aversion of sucrose suppresses the activity of VTA dopamine neurons and lateral shell dopamine binding to subsequent presentations of the sucrose tastant; 2) this pattern of activity was similar to the innately aversive tastant quinine; 3) dopamine responses were negatively correlated with behavioural (inferred taste reactivity) reactivity; and 4) dopamine responses tracked the contingency of between sucrose and illness because these responses recovered across extinction of the conditioned taste aversion.

      Strengths:

      There are important strengths here. The use of a well-controlled design, the measurement of both dopamine binding and VTA dopamine neuron activity, the inclusion of an extinction manipulation; and the thorough reporting of the data. I was not especially surprised by these results, but these data are a potentially important piece of the dopamine puzzle (e.g., as the authors note, salience-based argument struggles to explain these data).

      Weaknesses for consideration:

      (1) The focus here is on the lateral shell. This is a poorly investigated region in the context of the questions being asked here. Indeed, I suspect many readers might expect a focus on the medial shell. So, I think this focus is important. But, I think it does warrant greater attention in both the introduction and discussion. We do know from past work that there can be extensive compartmentalisation of dopamine responses to appetitive and aversive events and many of the inconsistent findings in the literature can be reconciled by careful examination of where dopamine is assessed. I do think readers would benefit from acknowledgement this - for example it is entirely reasonable to suppose that the findings here may be specific to the lateral shell.

      (2) Relatedly, I think readers would benefit from an explicit rationale for studying the lateral shell as well as consideration of this in the discussion. We know that there are anatomical (PMID: 17574681), functional (PMID: 10357457), and cellular (PMID: 7906426) differences between the lateral shell and the rest of the ventral striatum. Critically, we know that profiles of dopamine binding during ingestive behaviours there can be highly dissimilar to the rest of ventral striatum (PMID: 32669355). I do think these points are worth considering.

      (3) I found the data to be very thoughtfully analysed. But in places I was somewhat unsure:<br /> (a) Please indicate clearly in the text when photometry data show averages across trials versus when they show averages across animals.<br /> (b) I did struggle with the correlation analyses, for two reasons.<br /> (i) First, the key finding here is that the dopamine response to intraoral sucrose is suppressed by taste aversion. So, this will significantly restrict the range of dopamine transients, making interpretation of the correlations difficult.

      (ii) Second, the authors report correlations by combining data across groups/conditions. I understand why the authors have done this, but it does risk obscuring differences between the groups. So, my question is: what happens to this trend when the correlations are computed separately for each group? I suspect other readers will share the same question. I think reporting these separate correlations would be very helpful for the field - regardless of the outcome.

      (4) Figure 1A is not as helpful as it might be. I do think readers would expect a more precise reporting of GCaMP expression in TH+ and TH- neurons. I also note that many of the nuances in terms of compartmentalisation of dopamine signalling discussed above apply to ventral tegmental area dopamine neurons (e.g. medial v lateral) and this is worth acknowledging when interpreting.

    4. Reviewer #3 (Public review):

      Summary:

      This study helps to clarify the mixed literature on dopamine responses to aversive stimuli. While it is well accepted that dopamine in the ventral striatum increases in response to various rewarding and appetitive stimuli, aversive stimuli have been shown to evoke phasic increases or decreasing depending on the exact aversive stimuli, behavioral paradigm, and/or dopamine recording method and location examined. Here the authors use a well-designed set of experiments to show differential responses to an appetitive primary reward (sucrose) that later becomes a conditioned aversive stimulus (sucrose previously paired with lithium chloride in a conditioned taste aversion paradigm). The results are interesting and add valuable data to the question of how the mesolimbic dopamine system encodes aversive stimuli, however, the conclusions are strongly stated given that the current data do not necessarily align with prior conflicting data in terms of recording location, and it is not clear exactly how to interpret the generally biphasic dopamine response to the CTA-sucrose which also evolves over exposures within a single session.

      Strengths:

      • The authors nicely demonstrate that their two aversive stimuli examined, quinine and sucrose following CTA, evoked aversive facial expressions and paw movements that differed from those following rewarding sucrose to support that the stimuli experienced by the rats differ in valence.

      • Examined dopamine responses to the exact same sensory stimuli conditioned to have opposing valences, avoiding standard confounds of appetitive and aversive stimuli being sensed by different sensory modalities (i.e., sweet taste vs. electric shock).

      • The authors examined multiple measurements of dopamine activity - cell body calcium (GCaMP6f) in midbrain and release in NAc (Grab-DA2h), which is useful as the prior mixed literature on aversive dopamine responses comes from a variety of recording methods.

      • Correlations between sucrose preference and dopamine signals demonstrate behavioral relevance of the differential dopamine signals.

      • The delayed testing experiment in Figure 7 nicely controls for the effect of time to demonstrate that the "rewarding" dopamine response to sucrose only recovers after multiple extinction sucrose exposures to extinguish the CTA.

      Weaknesses for consideration:

      • Regional differences in dopamine signaling to aversive stimuli are mentioned in the introduction and discussion. For instance, the idea that dopamine encodes salience is strongly argued against in the discussion, but the paper cited as arguing for that (Kutlu et al. 2021) is recording from the medial core in mice. Given other papers cited in the text about the regional differences in dopamine signaling in the NAc and from different populations of dopamine neurons in midbrain, it's important to mention this distinction wrt to salience signaling. Relatedly, the text says that the lateral NAc shell was targeted for accumbens recordings, but the histology figure looks like the majority of fibers were in the anterior lateral core of NAc. For the current paper to be a convincing last word on the issue, it would be extremely helpful to have similar recordings done in other parts of the NAc to do a more thorough comparison against other studies.

      • Dopamine release in the NAc never dips below baseline for the conditioned sucrose. Is it possible to really consider this as a signal for valence per se, as opposed to it being a weaker response relative to the original sucrose response?

      • Related to this, the main measure of the dopamine signal here, "mean z-score," obscures the temporal dynamics of the aversive dopamine response across a trial. This measure is used to claim that sucrose after CTA is "suppressing" dopamine neuron activity and release, which is true relative to the positive valence sucrose response. However, both GRAB-DA and cell-body GCaMP measurements show clear increases after onset of sucrose infusion before dipping back to baseline or slightly below in the average of all example experiments displayed. One could point to these data to argue either that aversive stimuli cause phasic increases in dopamine (due to the initial increase) or decreases (due to the delayed dip below baseline) depending on the measurement window. Some discussion of the dynamics of the response and how it relates to the prior literature would be useful.<br /> - Would this delayed below-baseline dip be visible with a shorter infusion time?<br /> - Does the max of the increase or the dip of the decrease better correlate with the behavioral measures of aversion (orofacial, paw movements) or sucrose preference than "mean z-score" measure used here?<br /> - The authors argue strongly in the discussion against the idea that dopamine is encoding "salience." Could this initial peak (also seen in the first few trials of quinine delivery, fig 1c color plot) be a "salience" response?

      • Related to this, the color plots showing individual trials show a reduction in the increases to positive valence sucrose across conditioning day trials and a flip from infusion-onset increase to delayed increases across test day trials. This evolution across days makes it appear that the last few conditioning day trials would be impossible to discriminate from the first few test day trials in the CTA-paired. Presumably, from strength of CTA as a paradigm, the sucrose is already aversive to the animals at the first trial of test day. Why do the authors think the response evolves across this session?

      • Given that most of the work is using a conditioned aversive stimulus, the comparison to a primary aversive tastant quinine is useful. However, the authors saw basically no dopamine response to a primary aversive tastant quinine (measured only with GRAB-DA) and saw less noticeable decreases following CTA for NAc recordings with GRAB-DA2h than with cell body GCaMP. Given that they are using the high-affinity version of the GRAB sensor, this calls into question whether this is a true difference in release vs. soma activity or issue of high affinity release sensor making decreases in dopamine levels more difficult to observe.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Loh and colleagues investigate valence encoding in the mesolimbic dopamine system. Using an elegant approach, they show that sucrose, which normally evokes strong dopamine neuron activity and release in the nucleus accumbens, is made aversive via conditioned taste aversion, the same sucrose stimulus later evokes much less dopamine neuron activity and release. Thus, dopamine activity can dynamically track the changing valence of an unconditioned stimulus. These results are important for helping clarify valence and value related questions that are the matter of ongoing debate regarding dopamine functions in the field.

      Strengths:

      This is an elegant way to ask this question, the within subject's design and the continuity of the stimulus is a strong way to remove a lot of the common confounds that make it difficult to interpret valence-related questions. I think these are valuable studies that help tie up questions in the field while also setting up a number of interesting future directions. There are number of control experiments and tweaks to the design that help eliminate a number of competing hypotheses regarding the results. The data are clearly presented and contextualized.

      Weaknesses for consideration:

      The focus on one relatively understudied region of the rat striatum for dopamine recordings could potentially limit generalization of the findings. While this can be determined in future studies, the implications should be further discussed in the current manuscript.

      We agree that the manuscript would benefit from providing a stronger rationale for our recording sites and acknowledging the potential for regional differences in dopamine signaling. We have made the following additions to the manuscript:

      Added to the Discussion: “Recordings were targeted to the lateral VTA and the corresponding approximate terminal site in the NAc lateral shell (Lammel et al., 2008). Subregional differences in dopamine activity likely contribute to mixed findings on dopamine and affect. For example, dopamine in the NAc lateral shell differentially encodes cues predictive of rewarding sucrose and aversive footshock, which is distinct from NAc medial shell dopamine responses (de Jong et al., 2019). Our findings are similar to prior work from our group targeting recordings to the NAc dorsomedial shell (Hsu et al., 2020; McCutcheon et al., 2012; Roitman et al., 2008): there, intraoral sucrose increased NAc dopamine release while the response in the same rats to quinine was significantly lower.”

      Reviewer #2 (Public review):

      Summary:

      Koh et al. report an interesting manuscript studying dopamine binding in the lateral accumbens shell of rats across the course of conditioned taste aversion. The question being asked here is how does the dopamine system respond to aversion? The authors take advantage of unique properties of taste aversion learning (notably, within-subjects remapping of valence to the same physical stimulus) to address this.

      They combine a well controlled behavioural design (including key, unpaired controls) with fibre photometry of dopamine binding via GrabDA and of dopamine neuron activity by gCaMP, careful analyses of behaviour (e.g., head movements; home cage ingestion), the authors show that, 1) conditioned taste aversion of sucrose suppresses the activity of VTA dopamine neurons and lateral shell dopamine binding to subsequent presentations of the sucrose tastant; 2) this pattern of activity was similar to the innately aversive tastant quinine; 3) dopamine responses were negatively correlated with behavioural (inferred taste reactivity) reactivity; and 4) dopamine responses tracked the contingency of between sucrose and illness because these responses recovered across extinction of the conditioned taste aversion.

      Strengths:

      There are important strengths here. The use of a well-controlled design, the measurement of both dopamine binding and VTA dopamine neuron activity, the inclusion of an extinction manipulation; and the thorough reporting of the data. I was not especially surprised by these results, but these data are a potentially important piece of the dopamine puzzle (e.g., as the authors note, salience-based argument struggles to explain these data).

      Weaknesses for consideration:

      (1) The focus here is on the lateral shell. This is a poorly investigated region in the context of the questions being asked here. Indeed, I suspect many readers might expect a focus on the medial shell. So, I think this focus is important. But, I think it does warrant greater attention in both the introduction and discussion. We do know from past work that there can be extensive compartmentalisation of dopamine responses to appetitive and aversive events and many of the inconsistent findings in the literature can be reconciled by careful examination of where dopamine is assessed. I do think readers would benefit from acknowledgement this - for example it is entirely reasonable to suppose that the findings here may be specific to the lateral shell.

      As with our response to Reviewer 1, we agree that we should provide further rationale for focusing our recordings on the lateral shell and acknowledge potential differences in dopamine dynamics across NAc subregions. In addition to the changes in the Discussion detailed in our response to Reviewer 1, we have made the following additions to the Introduction:

      Added to the Introduction: “NAc lateral shell dopamine differentially encodes cues predictive of rewarding (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock), which is distinct from other subregions (de Jong et al., 2019). It is important to note that other regions of the NAc may serve as hedonic hotspots (e.g. dorsomedial shell; or may more closely align with the signaling of salience (e.g. ventromedial shell; (Yuan et al., 2021)).”

      (2) Relatedly, I think readers would benefit from an explicit rationale for studying the lateral shell as well as consideration of this in the discussion. We know that there are anatomical (PMID: 17574681), functional (PMID: 10357457), and cellular (PMID: 7906426) differences between the lateral shell and the rest of the ventral striatum. Critically, we know that profiles of dopamine binding during ingestive behaviours there can be highly dissimilar to the rest of ventral striatum (PMID: 32669355). I do think these points are worth considering.

      There are several reasons why dopamine dynamics were recorded in the NAc lateral shell:

      (1) Dopamine neurons in more medial aspects of the VTA preferentially target the NAc medial shell and core whereas dopamine neurons in the lateral VTA – our target for VTA DA recordings – project to the lateral shell of the NAc (Lammel et al., 2008). Thus, our goal was to sample NAc release dynamics in areas that receive projections from our cell body recording sites.

      (2) Cues predictive of reward availability (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock) are differentially encoded by NAc lateral shell dopamine, which is distinct from NAc ventromedial shell dopamine responses (de Jong et al., 2019). These findings suggest a role for NAc lateral shell dopamine in the encoding of a stimulus’s valence, which made the subregion an area of interest for further examination.

      (3) With respect to the medial NAc shell specifically, extensive literature had already shown it to be a ‘hedonic hotspot’ (Morales and Berridge, 2020; Yuan et al., 2021) whereas the ventral portion is more mixed with respect to valence (Yuan et al., 2021). We had previously shown that intraoral infusions of primary taste stimuli of opposing valence (i.e., sucrose and quinine) evoke differential responses in dopamine release within the NAc dorsomedial shell (Roitman et al., 2008). We more recently replicated differential dopamine responses from dopamine cell bodies in the lateral VTA (Hsu et al., 2020) and thus endeavored to the possibility of changing dopamine responses in the lateral VTA to the same stimulus as its valence changes. As a result of these choices, measuring dopamine release in the lateral shell was a logical choice. The field would greatly benefit from continued future work surveying the entirety of the VTA DA projection terminus. 

      We have included these points of justification in the Introduction and Discussion sections.

      (3) I found the data to be very thoughtfully analysed. But in places I was somewhat unsure:

      (a) Please indicate clearly in the text when photometry data show averages across trials versus when they show averages across animals.

      We have now explicitly indicated in the figure legends of Figures 1, 3, 5, 7, and 8:

      (1) In heat maps, each row represents the averaged (across rats) response on that trial.

      (2) Traces below heat maps represent the response to infusion averaged first across trials for each rat and then across all rats.

      (3) Insets represent the average z-score across the infusion period averaged first across all trials for each rat and then across all rats.

      (b) I did struggle with the correlation analyses, for two reasons.

      (i) First, the key finding here is that the dopamine response to intraoral sucrose is suppressed by taste aversion. So, this will significantly restrict the range of dopamine transients, making interpretation of the correlations difficult.

      The overall hypothesis is that the dopamine response would correlate with the valence of a taste stimulus – even and especially when the stimulus remained constant but its valence changed. We inferred valence from the behavioral reactivity to the stimulus – reasoning that an appetitive taste will evoke minimal movement of the nose and paws (presumably because the animals are primarily engaging in small mouth movements associated with ingestion as shown by the seminal work of Grill and Norgren (1978) and the many studies published by the K.C. Berridge group) whereas an aversive taste will evoke significantly more movement as the rats engage in rejection responses (e.g. forelimb flails, chin rubs, etc.). When we conducted our regression analyses we endeavored to be as transparent as possible and labeled each symbol based on group (Unpaired vs Paired) and day (Conditioning vs Test). Both behavioral reactivity and dopamine responses change – but only for the Paired rats across days. In this sense, we believe the interpretation is clear. However, the Reviewer raises an important criticism that there would essentially be a floor effect with dopamine responses. We believe this is mitigated by data acquired across extinction and especially in Figure 9B. Here, the observations that dopamine responses fall to near zero but return to pre-conditioning levels in the Paired group with strong correlation between dopamine and behavioral reactivity throughout would hopefully partially allay the Reviewer’s concerns. See Part ii below for further support.

      (ii) Second, the authors report correlations by combining data across groups/conditions. I understand why the authors have done this, but it does risk obscuring differences between the groups. So, my question is: what happens to this trend when the correlations are computed separately for each group? I suspect other readers will share the same question. I think reporting these separate correlations would be very helpful for the field -

      regardless of the outcome.

      To address this concern, we performed separate regression analyses for Paired and Unpaired rats and provide the table below to detail results where data were combined across groups or separated. Expectedly, all analyses in Paired rats indicated a significant inverse relationship between dopamine and behavioral reactivity. Afterall, it is only in this group where behavioral reactivity to the taste stimulus changes as function of conditioning. Perhaps even more striking is that in almost all comparisons, even when restricting the regression analysis to Unpaired rats, we still observed a significant inverse relationship between dopamine and behavioral reactivity in most experiments. We have outlined the separated correlations below (asterisks denote slopes significantly different from 0; * p<0.05; ** p<0.01; *** p<0.005; **** p<0.001):

      Author response table 1.

      (4) Figure 1A is not as helpful as it might be. I do think readers would expect a more precise reporting of GCaMP expression in TH+ and TH- neurons. I also note that many of the nuances in terms of compartmentalisation of dopamine signalling discussed above apply to ventral tegmental area dopamine neurons (e.g. medial v lateral) and this is worth acknowledging when interpreting t

      Others have reported (Choi et al., 2020) and quantified (Hsu et al., 2020) GCaMP6f expression in TH+ neurons. While we didn’t report these quantifications, our observations were very much in line with previous quantifications from our laboratory (Hsu et al. 2020).

      We agree that we should elaborate on VTA subregional differences and have answered this response above (See responses to Reviewer 1 Weakness #1 and Reviewer 2 Weakness #2).

      Reviewer #3 (Public review):

      Summary:

      This study helps to clarify the mixed literature on dopamine responses to aversive stimuli. While it is well accepted that dopamine in the ventral striatum increases in response to various rewarding and appetitive stimuli, aversive stimuli have been shown to evoke phasic increases or decreasing depending on the exact aversive stimuli, behavioral paradigm, and/or dopamine recording method and location examined. Here the authors use a well-designed set of experiments to show differential responses to an appetitive primary reward (sucrose) that later becomes a conditioned aversive stimulus (sucrose previously paired with lithium chloride in a conditioned taste aversion paradigm). The results are interesting and add valuable data to the question of how the mesolimbic dopamine system encodes aversive stimuli, however, the conclusions are strongly stated given that the current data do not necessarily align with prior conflicting data in terms of recording location, and it is not clear exactly how to interpret the generally biphasic dopamine response to the CTA-sucrose which also evolves over exposures within a single session.

      Strengths:

      • The authors nicely demonstrate that their two aversive stimuli examined, quinine and sucrose following CTA, evoked aversive facial expressions and paw movements that differed from those following rewarding sucrose to support that the stimuli experienced by the rats differ in valence.

      • Examined dopamine responses to the exact same sensory stimuli conditioned to have opposing valences, avoiding standard confounds of appetitive and aversive stimuli being sensed by different sensory modalities (i.e., sweet taste vs. electric shock)

      • The authors examined multiple measurements of dopamine activity - cell body calcium (GCaMP6f) in midbrain and release in NAc (Grab-DA2h), which is useful as the prior mixed literature on aversive dopamine responses comes from a variety of recording methods.

      • Correlations between sucrose preference and dopamine signals demonstrate behavioral relevance of the differential dopamine signals.

      • The delayed testing experiment in Figure 7 nicely controls for the effect of time to demonstrate that the "rewarding" dopamine response to sucrose only recovers after multiple extinction sucrose exposures to extinguish the CTA.

      Weaknesses for consideration:

      (1) Regional differences in dopamine signaling to aversive stimuli are mentioned in the introduction and discussion. For instance, the idea that dopamine encodes salience is strongly argued against in the discussion, but the paper cited as arguing for that (Kutlu et al. 2021) is recording from the medial core in mice. Given other papers cited in the text about the regional differences in dopamine signaling in the NAc and from different populations of dopamine neurons in midbrain, it's important to mention this distinction wrt to salience signaling. Relatedly, the text says that the lateral NAc shell was targeted for accumbens recordings, but the histology figure looks like the majority of fibers were in the anterior lateral core of NAc. For the current paper to be a convincing last word on the issue, it would be extremely helpful to have similar recordings done in other parts of the NAc to do a more thorough comparison against other studies.

      As the Reviewer notes, NAc dopamine recordings were aimed at the lateral NAc shell. It is possible that some dopamine neurons lying within the anterior lateral core were recorded. Fiber photometry and the size of the fiber optics cannot definitively identify the precise location and number of dopamine neurons from which we recorded. Still, recording sites did not systematically differ between groups. Further, the within-subjects design helps to mitigate any potential biases for one subregion over another. The results presented in the manuscript strongly support a valence code. It is difficult to be the ‘last word’ on this topic and we suspect debate will continue. We used taste stimuli for appetitive and aversive stimuli – whereas many in the field will continue to use other noxious stimuli (e.g. foot shock) that likely recruit different circuits en route to the VTA. And there may very well be a different regional profile for dopamine signaling with different noxious stimuli. Moreover, we used intraoral infusion to avoid confounds of stimulus avoidance and competing motivations (e.g. food or fluid deprivation). We believe that this is one of the most important and unique features of our report. Recent work supports a role for phasic increases in dopamine in avoidance of noxious stimuli (Jung et al., 2024) and it will be critical for the field to reflect on the differences between avoidance and aversion. Moreover, in ongoing studies we aspire to fully survey dopamine signaling in conditioned taste aversion across the medial-lateral and dorsal-ventral axes of the VTA and NAc.

      (2) Dopamine release in the NAc never dips below baseline for the conditioned sucrose. Is it possible to really consider this as a signal for valence per se, as opposed to it being a weaker response relative to the original sucrose response?

      Indeed, NAc dopamine release to intraoral quinine nor aversive sucrose doesn’t dip below baseline but rather dopamine binding doesn’t change from pre-infusion baseline levels. It should be noted that VTA dopamine cell body activity does indeed dip below baseline in response to aversive sucrose. Moreover, using fast-scan cyclic voltammetry, we showed that dopamine release dips below baseline in the NAc dorsomedial shell in response to intraoral quinine (Roitman et al., 2008). The differences across recording sites may reflect regional differences but they may also reflect differences in recording approaches. GrabDA2h, used here, has relatively slow kinetics that may obscure dips below baseline (see response Weakness# 8 below).

      (3) Related to this, the main measure of the dopamine signal here, "mean z-score," obscures the temporal dynamics of the aversive dopamine response across a trial. This measure is used to claim that sucrose after CTA is "suppressing" dopamine neuron activity and release, which is true relative to the positive valence sucrose response. However, both GRAB-DA and cell-body GCaMP measurements show clear increases after onset of sucrose infusion before dipping back to baseline or slightly below in the average of all example experiments displayed. One could point to these data to argue either that aversive stimuli cause phasic increases in dopamine (due to the initial increase) or decreases (due to the delayed dip below baseline) depending on the measurement window. Some discussion of the dynamics of the response and how it relates to the prior literature would be useful.

      We have used mean z-score to do much of our quantitative analyses but the Reviewer raises the intriguing possibility that we are masking an initial increase in dopamine release and VTA DA activity evoked by aversive taste by doing so. We included the heat maps in the manuscript to be as transparent as possible about the time course of dopamine responses – both within a trial and across trials. The Reviewer’s point prompted us to reflect further on the heat maps and recognize that trials early in the session often showed a brief increase in dopamine for aversive sucrose but this response dissipated (NAc dopamine release) or flipped (VTA DA cell body activity) over trials. We now quantitatively characterize this feature by looking at the timecourse of dopamine responses in each third of the trials (1-10, 11-20, 21-30; see Author response images 1,2 and 3). As we infer the valence of the stimulus from nose and paw movements (behavioral reactivity), it is especially striking that we a similar timecourse for changes in behavior. Collectively, the data may reflect an updating process that is relatively slow and requires experience of the stimulus in a new (aversive) state – that is, a model-free process. While our experiments were not designed to test the updating of dopamine responses and discern their participation in model-based versus model-free learning processes – another debate in the dopamine field (Cone et al., 2016; Deserno et al., 2021)– the data reflect a model-free process. This is further supported in the experiment involving multiple conditioning sessions, where dopamine ‘dips’ are observed in trials 1-10 on Conditioning Day 3 and Extinction Day 1 when the new value of sucrose has been established. Finally, the relatively slow updating of the value of sucrose is reflected in older literature using a continuous intraoral infusion. Using this approach, rats began rejecting the saccharin infusion only after ~2min rather than immediately (Schafe et al., 1998; Schafe and Bernstein, 1996; Wilkins and Bernstein, 2006).   

      Author response image 1.

      Author response image 2.

      Author response image 3.

      (4) Would this delayed below-baseline dip be visible with a shorter infusion time?

      While our experiments did not explore this parameter, it would be interesting to parametrically vary infusion duration times and examine differences in dopamine responses. However, we believe the most parsimonious explanation is that the ‘dip’ in VTA cell body activity develops as a function of the slow updating of the value of sucrose reflective of a model-free process. We recognize that this is mere speculation.

      (5) Does the max of the increase or the dip of the decrease better correlate with the behavioral measures of aversion (orofacial, paw movements) or sucrose preference than "mean z-score" measure used here?

      It seems plausible that finding the most extreme value from baseline could better correlate to behavioral measures. Time courses to max increase and max decrease are different. Moreover, with appetitive sucrose, there are often multiple transients that occur throughout a single intraoral infusion. Coupled with a noisy time course for individual components of behavioral reactivity, we determined that averaging data across the whole infusion period (i.e. mean z-score) was the most objective way we could analyze the dopamine and behavioral responses to taste stimuli.

      (6) The authors argue strongly in the discussion against the idea that dopamine is encoding "salience." Could this initial peak (also seen in the first few trials of quinine delivery, fig 1c color plot) be a "salience" response?

      Our response above to the potential for ‘mixed’ dopamine responses to aversive sucrose led to additional analyses that support a slow updating of both behavior and dopamine to the new, aversive value of sucrose. Quinine is innately aversive and thus the Reviewer rightly points out that even here we observe an increase in dopamine release evoked by quinine on the first few trials (as observed in the heat map). We’d like to note, though, that the order of stimulus exposure was counterbalanced across rats. In those rats first receiving a sucrose session, quinine initially caused a modest increase in dopamine release during the first 10 trials (which is more pronounced in the first 2 trials). In the subsequent 2 blocks of 10 trials, no such increase was observed. Interestingly, in rats for which quinine was their first stimulus, we did not see an increase in dopamine release on the first few trials (see Author response image 4). We speculate that the initial sucrose session required the value of intraoral infusions to be updated when quinine was delivered to these rats and that, once more, the updating process may be slow and akin to a model-free process. This analysis, at present, is underpowered but will direct future attention in follow-up work.

      Author response image 4.

      (7) Related to this, the color plots showing individual trials show a reduction in the increases to positive valence sucrose across conditioning day trials and a flip from infusion-onset increase to delayed increases across test day trials. This evolution across days makes it appear that the last few conditioning day trials would be impossible to discriminate from the first few test day trials in the CTA-paired. Presumably, from strength of CTA as a paradigm, the sucrose is already aversive to the animals at the first trial of test day. Why do the authors think the response evolves across this session?

      As the Reviewer noted, Points 3-7 are related. We have speculated that the evolving dopamine response in Paired rats across test day trials reflects a model-free process. Importantly, as in the manuscript, our additional analyses once again show a tight relationship between behavioral reactivity and the dopamine response across the test session trials. It is important to note, though, that these experiments were not designed to test if responses reflect model-free or model-based processes.

      (8) Given that most of the work is using a conditioned aversive stimulus, the comparison to a primary aversive tastant quinine is useful. However, the authors saw basically no dopamine response to a primary aversive tastant quinine (measured only with GRAB-DA) and saw less noticeable decreases following CTA for NAc recordings with GRAB-DA2h than with cell body GCaMP. Given that they are using the high-affinity version of the GRAB sensor, this calls into question whether this is a true difference in release vs. soma activity or issue of high affinity release sensor making decreases in dopamine levels more difficult to observe.

      We share the same speculation as the Reviewer. Using fast-scan cyclic voltammetry, albeit measuring dopamine concentration in the dorsomedial shell, we observed a clear decrease from baseline with intraoral infusions of quinine (Roitman et al., 2008). Using fiber photometry here, the Reviewer and we note that GRAB_DA2h is a high-affinity (i.e., EC50: 7nM) dopamine sensor with relatively long off-kinetics (i.e., t1/2 decay time: 7300ms) (Labouesse et al., 2020). It may therefore be much more difficult to observe decreases (below baseline) using this sensor. The publication of new dopamine sensors - with lower affinity, faster kinetics, and greater dynamic range (Zhuo et al., 2024) – introduces opportunities for comparison and the greater potential for capturing decreases below baseline. Due to the poorer kinetics associated with GRAB_DA2h, we would not assert that direct comparisons between the GCaMP- and GRAB-based signals observed here represent true differences between somatic and terminal activity.

      References

      Choi JY, Jang HJ, Ornelas S, Fleming WT, Fürth D, Au J, Bandi A, Engel EA, Witten IB. 2020. A Comparison of Dopaminergic and Cholinergic Populations Reveals Unique Contributions of VTA Dopamine Neurons to Short-Term Memory. Cell Rep 33. doi:10.1016/j.celrep.2020.108492

      Cone JJ, Fortin SM, McHenry JA, Stuber GD, McCutcheon JE, Roitman MF. 2016. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proc Natl Acad Sci U S A 113. doi:10.1073/pnas.1519643113

      de Jong JW, Afjei SA, Pollak Dorocic I, Peck JR, Liu C, Kim CK, Tian L, Deisseroth K, Lammel S. 2019. A Neural Circuit Mechanism for Encoding Aversive Stimuli in the Mesolimbic Dopamine System. Neuron 101. doi:10.1016/j.neuron.2018.11.005

      Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. 2021. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. Elife 10. doi:10.7554/eLife.67778

      Hsu TM, Bazzino P, Hurh SJ, Konanur VR, Roitman JD, Roitman MF. 2020. Thirst recruits phasic dopamine signaling through subfornical organ neurons. Proc Natl Acad Sci U S A 117:30744–30754. doi:10.1073/PNAS.2009233117/-/DCSUPPLEMENTAL

      Jung K, Krüssel S, Yoo S, An M, Burke B, Schappaugh N, Choi Y, Gu Z, Blackshaw S, Costa RM, Kwon HB. 2024. Dopamine-mediated formation of a memory module in the nucleus accumbens for goal-directed navigation. Nat Neurosci. doi:10.1038/s41593-024-01770-9

      Labouesse MA, Cola RB, Patriarchi T. 2020. GPCR-based dopamine sensors—A detailed guide to inform sensor choice for in vivo imaging. Int J Mol Sci. doi:10.3390/ijms21218048

      Lammel S, Hetzel A, Häckel O, Jones I, Liss B, Roeper J. 2008. Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System. Neuron 57. doi:10.1016/j.neuron.2008.01.022

      McCutcheon JE, Ebner SR, Loriaux AL, Roitman MF, Tobler PN. 2012. Encoding of aversion by dopamine and the nucleus accumbens. Front Neurosci 6. doi:10.3389/fnins.2012.00137

      Morales I, Berridge KC. 2020. ‘Liking’ and ‘wanting’ in eating and food reward: Brain mechanisms and clinical implications. Physiol Behav. doi:10.1016/j.physbeh.2020.113152

      Roitman MF, Wheeler RA, Wightman RM, Carelli RM. 2008. Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli. Nature Neuroscience 2008 11:12 11:1376–1377. doi:10.1038/nn.2219

      Schafe GE, Bernstein IL. 1996. Forebrain contribution to the induction of a brainstem correlate of conditioned taste aversion: I. The amygdala. Brain Res 741. doi:10.1016/S0006-8993(96)00906-7

      Schafe GE, Thiele TE, Bernstein IL. 1998. Conditioning method dramatically alters the role of amygdala in taste aversion learning. Learning and Memory 5. doi:10.1101/lm.5.6.481

      Wilkins EE, Bernstein IL. 2006. Conditioning method determines patterns of c-fos expression following novel taste-illness pairing. Behavioural Brain Research 169. doi:10.1016/j.bbr.2005.12.006

      Yuan L, Dou YN, Sun YG. 2021. Topography of reward and aversion encoding in the mesolimbic dopaminergic system. Journal of Neuroscience 39. doi:10.1523/JNEUROSCI.0271-19.2019

      Zhuo Y, Luo B, Yi X, Dong H, Miao X, Wan J, Williams JT, Campbell MG, Cai R, Qian T, Li F, Weber SJ, Wang L, Li B, Wei Y, Li G, Wang H, Zheng Y, Zhao Y, Wolf ME, Zhu Y, Watabe-Uchida M, Li Y. 2024. Improved green and red GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 21. doi:10.1038/s41592-023-02100-w

    1. Author response:

      Reviewer #1:

      We agree with Reviewer 1 that the flexibility of SPRAWL also makes it difficult to interpret its outputs. We consider SPRAWL to be a hypothesis-generation tool to answer simple questions of subcellular localization in a statistically robust manner. In this paper we include examples of how it can be incorporated with other tools and wetlab experimentation to build biological intuition. Our hope is that the SPRAWL software, or even the underlying simple statistical ideas are of use to others in the field.

      Reviewer #2:

      We agree with Reviewer #2 that this manuscript does not demonstrate biological significance of the observed results of applying SPRAWL to massively multiplexed FISH datasets. We agree it would require additional wetlab experiments such as cell-type specific and isoform-resolved fluorescence in-situ hybridization, which we consider beyond the scope of this paper. We believe that the observed correlations of subcellular localization detected by SPRAWL and the differential 3’ UTR usage detected by ReadZS are compelling, although not conclusive, as are the Timp3 experimental studies.

      Our understanding is that Baysor is primarily a cell-segmentation algorithm, which is not what SPRAWL attempts to achieve. Baysor states that it identifies “cells of a distinct type will give rise to small molecular neighborhoods with stereotypical transcriptional composition, making it possible to interpret such neighborhoods without performing explicit cell segmentation” which we understand to mean that Baysor identifies spatial groupings of cells with “stereotypical transcriptional composition” rather than subcellular RNA localization. We do not think that SPRAWL and Baysor are comparable, but instead Baysor could be used as an upstream step to SPRAWL to potentially improve cell segmentation.

      Reviewer #3:

      We thank Reviewer #3 for identifying discrepancies in the paper which we addressed to the best of our abilities.

    1. Author response:

      Reviewer 1:

      Many thanks for your positive review and clear overview of our paper. We also agree with your interpretation of our results that ‘the information that is decodable and the information that is task-relevant may relate in very different ways’ and we could have emphasised this point more in the paper.

      With regards to the qualitative similarities between our models and our data, we agree that due to the fact that one can achieve any desired level of activity, decoding accuracy, performance, etc in a model, we focussed on changes over learning of key metrics that are commonly used in the field. Although this can appear qualitative at times because the raw values can differ between the data and our models, our main results are ultimately strongly quantitative (e.g., Fig. 3c,d, and Fig. 5f). We note that we could have fine tuned the models to have similar activity levels, decoding accuracies etc to our data, and on the face of it this may have made the results appear more convincing, but we felt that such trivial fine tuning does not change any of our key results in any fundamental way and is not the aim of computational modelling. The model one chooses to analyse will always be abstracted from biology in some way, by definition.

      Reviewer 2:

      Thank you very much for your kind comments and clear overview of our paper. We also hope that our paper ‘provides a valuable analysis of the effect of two parameters on representations of irrelevant stimuli in trained RNNs.’

      With regards to our suggested mechanism of suppressing dynamically irrelevant stimuli, we are sorry that we did not provide a sufficient enough explanation of suppressing color representations when they are irrelevant. We hopefully provide a longer explanation here. Our mechanism of suppression of dynamically irrelevant stimuli does not suggest that it becomes un-suppressed later, only the behaviourally relevant variable should be decodable when it is needed (i.e., XOR). Although color decodability did increase slightly in the data and some of the models from the color period to the shape period, it was typically not significant and was therefore not a result that we emphasise in the paper (although this could be analysed further to see if additional mechanisms might explain it). We emphasise throughout that color decoding is typically similar between color and shape periods (either high or low) and either decreases or increases over time in both periods. We also focus on whether color decodability increases or decreases over learning during the color period when it is irrelevant (which we call ‘early color decoding’). Importantly, decoding of color or shape is not needed to perform the task, only decoding of XOR is needed to perform the task. For example, in our two-neuron networks, we observe perfect XOR decoding and only 75% decoding of color and shape, and decoding during the shape period is the same as the network at initialisation before any training. The mechanism we suggest of suppressing dynamically irrelevant stimuli does not predict that that stimulus should be un-suppressed later, only the behaviourally relevant variable should be decodable (i.e., XOR). Instead, what we try to explain is that color inputs can generate 0 firing rate during the color period, when that input does not need to be used and is therefore irrelevant (and color decoding decreases during the color period over learning), but these inputs can be combined with shape inputs later to create a perfectly decodable XOR response.

      With regards to interpretation of our results based on metabolic cost constraints, we feel that this is an unnecessarily strong criticism to say that it ‘is not backed up by the presented data/analyses.’ All of our models were trained with only a metabolic cost constraint, a noise strength, and a task performance term. Therefore, the results of the models are directly attributable to the strength of metabolic cost that we use. Additionally, although one could in principle pick any of infinitely many different parameters to change and measure the response in an optimized network, varying metabolic cost and noise are two of the most fundamental phenomena that neural circuits must contend with, and many studies have analysed the impact they have on neural circuit dynamics. Furthermore, in line with previous studies (Yang et al., 2019, Whittington et al., 2022, Sussillo et al., 2015, Orhan et al., 2019, Kao et al., 2021, Cueva et al., 2020, Driscoll et al., 2022, Song et al., 2016, Masse et al., 2019, Schimel et al., 2023), we operationalized metabolic cost in our models through L2 firing rate regularization. This cost penalizes high overall firing rates. (Such an operationalization of metabolic cost also makes sense for our models because network performance is based on firing rates rather than subthreshold activities.) There are however alternative conceivable ways to operationalize a metabolic cost; for example L1 firing rate regularization has been used previously when optimizing neural networks and promotes more sparse neural firing. Interestingly, although our L2 is generally conceived to be weaker than L1 regularization, we still found that it encouraged the network to use purely sub-threshold activity in our task. The regularization of synaptic weights may also be biologically relevant because synaptic transmission uses the most energy in the brain compared to other processes (Faria-Pereira et al., 2022, Harris et al., 2012). Additionally, even subthreshold activity could be regularized as it also consumes energy (although orders of magnitude less than spiking (Zhu et al., 2019)). Therefore, future work will be needed to examine how different metabolic costs affect the dynamics of task-optimized networks.

      With regards to color representations in PFC only qualitatively matching those in our models, in line with the comment from Reviewer 1, we agree that due to the fact that one can achieve any desired level of activity, decoding accuracy, performance, etc in a model, we focussed on changes over learning of key metrics that are commonly used in the field. Although this can appear qualitative at times because the raw values can differ between the data and our models, our main results are ultimately strongly quantitative (e.g., Fig. 3c,d, and Fig. 5f). We note that we could have fine tuned the models to have similar activity levels, decoding accuracies etc to our data, and on the face of it this may have made the results appear more convincing, but we felt that such trivial fine tuning does not change any of our key results in any fundamental way and is not the aim of computational modelling. The model one chooses to analyse will always be abstracted from biology in some way, by definition. Finally, of course we note that changes in color decoding could result from other causes, but we focussed on two key phenomena that neural circuits must contend with: noise and metabolic costs. Therefore, it is likely that these two variables play a strong role in stimulus representations in neural circuits

      Reviewer 3:

      Thank you very much for your thorough and clear overview of our paper and we agree that it is important to investigate phenomena and manipulations in computational models that are almost impossible to do in vivo and we are pleased you found our mathematical analyses rigorous and nicely documented.

      Although we agree that it can be useful to study the responses of individual neurons, we focussed on population analyses of all available neurons without omitting or specifically selecting neurons based on their dynamics. We are also not suggesting that the activities of individual ‘neurons’ in the models and data should be similar since our models are highly abstract firing rate models. But rather, the overall computational strategy, which one can access through population decoding and cross-generalised decoding, was what we were interested in comparing between the models and the data and is arguably the correct level of analysis of such models (an data) given our key questions (Vyas et al., 2020, Churchland et al., 2012, Mante et al., 2013, Ebitz et al., 2021).

      We also certainly agree and are more than open to the fact that suppression of irrelevant stimuli may already be happening on the inputs arriving in PFC. Indeed, we actually suggest this as the mechanism in Fig. 5 (together with recurrent circuit dynamics that make use of these inputs).

      With regards to the dynamics of the two-neuron networks not being ‘informative of what happens in brain networks’, we agree that these models are very simplified and may only contain very fundamental similarities with biological neurons. However, we only used them to illustrate the fundamental mechanism of generating 0 firing rate during the color epoch so that it is more easily understandable for readers as they can see the entire 2-dimensional state space and the entire computational strategy can be seen (Fig. 5a-d). We also note that we did this for both rectified linear and tanh networks, thus showing that such a mechanism is preserved across fundamentally different firing rate nonlinearities. Additionally, after illustrating this fundamental mechanism of networks receiving color information but generating 0 firing rate, we show that the exact same mechanism is at play in the large networks we use throughout the paper (Fig. 5e). We also only compare the large networks to our neural recordings. We do agree though that it would be interesting to further compare fundamental similarities and differences between our models and our neural recordings (always at the right level of analysis that makes sense for our chosen models) to show that the mechanisms we uncover in our models are also strongly relevant for our data.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.

      Strengths:

      (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.

      (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.

      (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.

      (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.

      (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.

      (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.

      Weaknesses:

      (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.

      (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.

      We thank the reviewer for describing the study’s strengths and weaknesses.  In the upcoming revision, we will:

      (1) improve the writing and clarify the nomenclature and contradictory statements, particularly those noted in the Reviewer’s Recommendations for Authors; and

      (2) scale back the claims related to the role of SYK in the cone precursor response to RB1 loss; we agree that genetic perturbation of SYK is required to prove it’s role and will perform such analyses in a separate study.

      Reviewer #2 (Public review):

      Summary:

      The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.

      Strengths:

      This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging full-length sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, long-read RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.

      Weaknesses:

      The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.

      We thank the reviewer for summarizing the main findings and for noting the compelling support for the conclusions, the intriguing cell type-dependent splicing of rod and cone lineage factors, and the insights into retinoblastoma development. 

      We concur that some studies of developing photoreceptors were not well connected to retinoblastoma, which diminished the focus.  However, we suggest that it was valuable to highlight how deep, long read sequencing provided new insights into retinoblastoma. For example, our demonstration of similar rod- and cone-related gene expression in early cones and RB cells addressed concerns with the proposed cone cell-of-origin, adding disease relevance.

      We will address the Reviewer’s request to cross-validate our findings with those of other single-cell studies of developing human retina and to adjust the related statements in our upcoming revision.

      Reviewer #3 (Public review):

      Summary:

      The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.

      Strengths:

      The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.

      Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.

      Weaknesses:

      The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.

      We thank the reviewer for describing the strengths of the study. Our upcoming revision will address the minor concerns that were raised separately in the Reviewer’s Recommendations for Authors.

    2. eLife Assessment

      In this paper, the authors use single-cell RNA sequencing to understand post-mitotic cone and rod developmental states and identify cone-specific features that contribute to retinoblastoma genesis. The work is important and the evidence is generally convincing. The findings of rod/cone fate determination at a very early stage are intriguing.

    3. Reviewer #1 (Public review):

      Summary:

      The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.

      Strengths:

      (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.

      (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.

      (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.

      (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.

      (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.

      (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.

      Weaknesses:

      (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.

      (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.

    4. Reviewer #2 (Public review):

      Summary:

      The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.

      Strengths:

      This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging full-length sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, long-read RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.

      Weaknesses:

      The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.

    5. Reviewer #3 (Public review):

      Summary:

      The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.

      Strengths:

      The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.

      Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.

      Weaknesses:

      The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.

    1. eLife Assessment

      This important study addresses how DNA replication restarts in Escherichia coli in the absence of a functional replication initiator protein DnaA. The authors show that helicase DnaB loading at the replication origin oriC can be executed by PriC under sub-optimal initiation conditions. While the genetic and biochemical evidence is solid, there is so far no direct evidence for PriC acting at oriC in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript reports the investigation of PriC activity during DNA replication initiation in Escherichia coli. It is reported that PriC is necessary for the growth and control of DNA replication initiation under diverse conditions where helicase loading is perturbed at the chromosome origin oriC. A model is proposed where PriC loads helicase onto ssDNA at the open complex formed by DnaA at oriC. Reconstituted helicase loading assays in vitro support the model. The manuscript is well-written and has a logical narrative.

      Major Questions/Comments:

      An important observation here is that a ΔpriC mutant alone displays under-replication, suggesting that this helicase loading pathway is physiologically relevant. Has this PriC phenotype been reported previously? If not, would it be possible to confirm this result using an independent experimental approach (e.g. marker frequency analysis or fluorescent reporter-operator systems)?

      Is PriA necessary for the observed PriC activity at oriC? Is there evidence that PriC functions independently of PriA in vivo?

      Is PriC helicase loading activity in vivo at the origin direct (the genetic analysis leaves other possibilities tenable)? Could PriC enrichment at oriC be detected using chromatin immunoprecipitation?

    3. Reviewer #2 (Public review):

      This is a great paper. Yoshida et al. convincingly show that DnaA does not exclusively do loading of the replicative helicase at the E. coli oriC, but that PriC can also perform this function. Importantly, PriC seems to contribute to helicase loading even in wt cells albeit to a much lesser degree than DnaA. On the other hand, PriC takes a larger role in helicase loading during aberrant initiation, i.e. when the origin sequence is truncated or when the properties of initiation proteins are suboptimal. Here highlighted by mutations in dnaA or dnaC.

      This is a major finding because it clearly demonstrates that the two roles of DnaA in the initiation process can be separated into initially forming an open complex at the DUE region by binding/nucleation onto DnaA-boxes and second by loading of the helicase. Whereas these two functions are normally assumed to be coupled, the present data clearly show that they can be separated and that PriC can perform at least part of the helicase loading provided that an area of duplex opening is formed by DnaA.

      This puts into question the interpretation of a large body of previous work on mutagenesis of oriC and dnaA to find a minimal oriC/DnaA complex in many bacteria. In other words, mutants in which oriC is truncated/mutated may support the initiation of replication and cell viability only in the presence of PriC. Such mutants are capable of generating single-strand openings but may fail to load the helicase in the absence of PriC. Similarly, dnaA mutants may generate an aberrant complex on oriC that trigger strand opening but are incapable of loading DnaB unless PriC is present.

      In the present work, the sequence of experiments presented is logical and the manuscript is clearly written and easy to follow. The very last part regarding PriC in cSDR replication does not add much to the story and may be omitted.

    4. Reviewer #3 (Public review):

      Summary:

      At the abandoned replication fork, loading of DnaB helicase requires assistance from PriABC, repA, and other protein partners, but it does not require replication initiator protein, DnaA. In contrast, nucleotide-dependent DnaA binding at the specific functional elements is fundamental for helicase loading, leading to the DUE region's opening. However, the authors questioned in this study that in case of impeding replication at the bacterial chromosomal origins, oriC, a strategy similar to an abandoned replication fork for loading DnaB via bypassing the DnaA interaction step could be functional. The study by Yoshida et al. suggests that PriC could promote DnaB helicase loading on the chromosomal oriC ssDNA without interacting with the DnaA protein. However, the conclusions drawn from the primarily qualitative data presented in the study could be slightly overwhelming and need supportive evidence.

      Strengths:

      Understanding the mechanism of how DNA replication restarts via reloading the replisomes onto abandoned DNA replication forks is crucial. Notably, this knowledge becomes crucial to understanding how bacterial cells maintain DNA replication from a stalled replication fork when challenging or non-permissive conditions prevail. This critical study combines experiments to address a fundamental question of how DnaB helicase loading could occur when replication initiation impedes at the chromosomal origin, leading to replication restart.

      Weaknesses:

      The term colony formation used for a spotting assay could be misleading for apparent reasons. Both assess cell viability and growth; while colony formation is quantitative, spotting is qualitative. Particularly in this study, where differences appear minor but draw significant conclusions, the colony formation assays representing growth versus moderate or severe inhibition are a more precise measure of viability.

      Figure 2<br /> The reduced number of two oriC copies per cell in the dnaA46priC-deficient strain was considered moderate inhibition. When combined with the data suggested by the dnaAC2priC-deficient strain containing two origins in cells with or without PriC (indicating no inhibition)-the conclusion was drawn that PriC rescue blocked replication via assisting DnaC-dependent DnaB loading step at oriC ssDNA.

      The results provided by Saifi B, Ferat JL. PLoS One. 2012;7(3):e33613 suggests the idea that in an asynchronous DnaA46 ts culture, the rate by which dividing cells start accumulating arrested replication forks might differ (indicated by the two subpopulations, one with single oriC and the other with two oriC). DnaA46 protein has significantly reduced ATP binding at 42C, and growing the strain at 42C for 40-80 minutes before releasing them at 30 C for 5 minutes has the probability that the two subpopulations may have differences in the active ATP-DnaA. The above could be why only 50% of cells contain two oriC. Releasing cells for more time before adding rifampicin and cephalexin could increase the number of cells with two oriCs. In contrast, DnaC2 cells have inactive helicase loader at 42 C but intact DnaA-ATP population (WT-DnaA at 42 or 30 C should not differ in ATP-binding). Once released at 30 C, the reduced but active DnaC population could assist in loading DnaB to DnaA, engaged in normal replication initiation, and thus should appear with two oriC in a PriC-independent manner.

      Broadly, the evidence provided by the authors may support the primary hypothesis. Still, it could call for an alternative hypothesis: PriC involvement in stabilizing the DnaA-DnaB complex (this possibility could exist here). To prove that the conclusions made from the set of experiments in Figures 2 and 3, which laid the foundations for supporting the primary hypothesis, require insights using on/off rates of DnaB loading onto DnaA and the stability of the complexes in the presence or absence of PriC, I have a few other reasons to consider the latter arguments.

      Figure 3<br /> One should consider the fact that dnA46 is present in these cells. Overexpressing pdnaAFH could produce mixed multimers containing subunits of DnaA46 (reduced ATP binding) and DnaAFH (reduced DnaB binding). Both have intact DnaA-DnaA oligomerization ability. The cooperativity between the two functions by a subpopulation of two DnaA variants may compensate for the individual deficiencies, making a population of an active protein, which in the presence of PriC could lead to the promotion of the stable DnaA: DnaBC complexes, able to initiate replication. In the light of results presented in Hayashi et al. and J Biol Chem. 2020 Aug 7;295(32):11131-11143, where mutant DnaBL160A identified was shown to be impaired in DnaA binding but contained an active helicase function and still inhibited for growth; how one could explain the hypothesis presented in this manuscript. If PriC-assisted helicase loading could bypass DnaA interaction, then how growth inhibition in a strain carrying DnaBL160A should be described. However, seeing the results in light of the alternative possibility that PriC assists in stabilizing the DnaA: DnaBC complex is more compatible with the previously published data.

      Figure 4<br /> Overexpression of DiaA could contribute to removing a higher number of DnaA populations. This could be more aggravated in the absence of PriC (DiaA could titrate out more DnaA)- the complex formed between DnaA: DnaBC is not stable, therefore reduced DUE opening and replication initiation leading to growth inhibition (Fig. 4A ∆priC-pNA135). Figure 7C: Again, in the absence of PriC, the reduced stability of DnaA: DnaBC complex leaves more DnaA to titrate out by DiaA, and thus less Form I*. However, adding PriC stabilizes the DnaA: DnaBC hetero-complexes, with reduced DnaA titration by DiaA, producing additional Form I*. Adding a panel with DnaBL160A that does not interact with DnaA but contains helicase activity could be helpful. Would the inclusion of PriC increase the ability of mutant helicase to produce additional Form I*?

      Figure 5<br /> The interpretation is that colony formation of the Left-oriC ∆priC double mutant was markedly compromised at 37˚C (Figure 5B), and 256 the growth defects of the Left-oriC mutant at 25{degree sign}C and 30{degree sign}C were aggravated. However, prima facia, the relative differences in the growth of cells containing and lacking PriC are similar. Quantitative colony-forming data is required to claim these results. Otherwise, it is slightly confusing.

      A minor suggestion is to include cells expressing PriC using plasmid DNA to show that adding PriC should reverse the growth defect of dnaA46 and dnaC2 strains at non-permissive temperatures. The same should be added at other appropriate places.

    1. eLife Assessment

      This is a useful report of a spatially-extended model to study the complex interactions between immune cells, fibroblasts, and cancer cells, providing insights into how fibroblast activation can influence tumor progression. The model opens up new possibilities for studying fibroblast-driven effects in diverse settings, which is crucial for understanding potential tumor microenvironment manipulations that could enhance immunotherapy efficacy. While the results presented are solid and follow logically from the model's assumptions, some of these assumptions may require further validation, as they appear to oversimplify certain aspects in light of complex experimental findings, system geometry, and general principles of active matter research.

    2. Reviewer #1 (Public review):

      The authors present an important work where they model some of the complex interactions between immune cells, fibroblasts and cancer cells. The model takes into account the increased ECM production of cancer-associated fibroblasts. These fibres trap the cancer but also protect it from immune system cells. In this way, these fibroblasts' actions both promote and hinder cancer growth. By exploring different scenarios, the authors can model different cancer fates depending on the parameters regulating cancer cells, immune system cells and fibroblasts. In this way, the model explores non-trivial scenarios. An important weakness of this study is that, though it is inspired by NSCLC tumors, it is restricted to modelling circular tumor lesions and does not explore the formation of ramified tumors, as in NSCLC. In this way, is only a general model and it is not clear how it can be adapted to simulate more realistic tumor morphologies.

    3. Reviewer #2 (Public review):

      Summary:

      The authors develop a computational model (and a simplified version thereof) to treat an extremely important issue regarding tumor growth. Specifically, it has been argued that fibroblasts have the ability to support tumor growth by creating physical conditions in the tumor microenvironment that prevent the relevant immune cells from entering into contact with, and ultimately killing, the cancer cells. This inhibition is referred to as immune exclusion. The computational approach follows standard procedures in the formulation of models for mixtures of different material species, adapted to the problem at hand by making a variety of assumptions as to the activity of different types of fibroblasts, namely "normal" versus "cancer-associated". The model itself is relatively complex, but the authors do a convincing job of analyzing possible behaviors and attempting to relate these to experimental observations.

      Strengths:

      As mentioned, the authors do an excellent job of analyzing the behavior of their model both in its full form (which includes spatial variation of the concentrations of the different cellular species) and in its simplified mean field form. The model itself is formulated based on established physical principles, although the extent to which some of these principles apply to active biological systems is not clear (see Weaknesses). The results of the model do offer some significant insights into the critical factors which determine how fibroblasts might affect tumor growth; these insights could lead to new experimental ways of unraveling these complex sets of issues and enhancing immunotherapy.

      Weaknesses:

      Models of the form being studied here rely on a large number of assumptions regarding cellular behavior. Some of these seemed questionable, based on what we have learned about active systems. The problem of T cell infiltration as well as the patterning of the extracellular matrix (ECM) by fibroblasts necessarily involve understanding cell motion and cell interactions due e.g. to cell signaling. Adopting an approach based purely on physical systems driven by free energies alone does not consider the special role that active processes can play, both in motility itself and in the type of self-organization that can occur due to these cell-cell interactions. This to me is the primary weakness of this paper.

      A separate weakness concerns the assumption that fibroblasts affect T cell behavior primarily by just making a more dense ECM. There are a number of papers in the cancer literature (see, for some examples, Carstens, J., Correa de Sampaio, P., Yang, D. et al. Spatial computation of intratumoral T cells correlates with survival of patients with pancreatic cancer. Nat Commun 8, 15095 (2017); Sun, Xiujie, Bogang Wu, Huai-Chin Chiang, Hui Deng, Xiaowen Zhang, Wei Xiong, Junquan Liu et al. "Tumour DDR1 promotes collagen fibre alignment to instigate immune exclusion." Nature 599, no. 7886 (2021): 673-678) that seem to indicate that density alone is not a sufficient indicator of T cell behavior. Instead, the organization of the ECM (for example, its anisotropy) could be playing a much more essential role than is given credit for here. This possibility is hinted at in the Discussion section but deserves much more emphasis.

      Finally, the mixed version of the model is, from a general perspective, not very different from many other published models treating the ecology of the tumor microenvironment (for a survey, see Arabameri A, Asemani D, Hadjati J (2018), A structural methodology for modeling immune-tumor interactions including pro-and anti-tumor factors for clinical applications. Math Biosci 304:48-61). There are even papers in this literature that specifically investigate effects due to allowing cancer cells to instigate changes in other cells from being tumor-inhibiting to tumor-promoting. This feature occurs not only for fibroblasts but also for example for macrophages which can change their polarization from M1 to M2. There needed to be some more detailed comparison with this existing literature.

    1. eLife Assessment

      This manuscript presents important information as to how adolescent alcohol exposure (AIE) alters pain behavior and relevant neurocircuits, with compelling data. The manuscript focuses on how AIE alters the basolateral amygdala, to the PFC (PV-interneurons), to the periaquaductal gray circuit, resulting in feed-forward inhibition. The manuscript is a detailed study of the role of alcohol exposure in regulating the circuit and reflexive pain, however, the role of the PV interneurons in mechanistically modulating this feed-forward circuit could be more strongly supported.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript by Obray et al., the authors show that adolescent ethanol exposure increases mechanical allodynia in adulthood. Additionally, they show that BLA-mediated inhibition of the prelimbic cortex is reduced, resulting in increased excitability in neurons that then project to vlPAG. This effect was mediated by BLA inputs onto PV interneurons. The primary finding of the manuscript is that these AIE-induced changes further impact acute pain processing in the BLA-PrL-vlPAG circuit, albeit behavioral readouts after inducing acute pain were not different between AIE rats and controls. These results provide novel insights into how AIE can have long-lasting effects on pain-related behaviors and neurophysiology. In this manuscript by Obray et al., the authors show that adolescent ethanol exposure increases mechanical allodynia in adulthood. Additionally, they show that BLA-mediated inhibition of the prelimbic cortex is reduced, resulting in increased excitability in neurons that then project to vlPAG. This effect was mediated by BLA inputs onto PV interneurons. The primary finding of the manuscript is that these AIE-induced changes further impact acute pain processing in the BLA-PrL-vlPAG circuit, albeit behavioral readouts after inducing acute pain were not different between AIE rats and controls. These results provide novel insights into how AIE can have long-lasting effects on pain-related behaviors and neurophysiology.

      Strengths:

      The manuscript was very well written and the experiments were rigorously conducted. The inclusion of both behavioral and neurophysiological circuit recordings was appropriate and compelling. The attention to SABV and appropriate controls was well thought out. The Discussion provided novel ideas for how to think about AIE and chronic pain and proposed several interesting mechanisms. This was a very well-executed set of experiments.

      Weaknesses:

      There is a mild disconnect between behavioral readout (reflexive pain) and neural circuits of interest (emotional). Considering that this circuit is likely engaged in the aversiveness of pain, it would have been interesting to see how carrageenan and/or AIE impacted non-reflexive pain measures. Perhaps this would reveal a potentiated or dysregulated phenotype that matches the neurophysiological changes reported. However, this critique does not take away from the value of the paper or its conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Obray et al. entitled "Adolescent alcohol exposure promotes mechanical allodynia and alters synaptic function at inputs from the basolateral amygdala to the prelimbic cortex" investigated how adolescent intermittent ethanol exposure (AIE) affects the BLA -> PL circuit, with an emphasis on PAG projecting PL neurons, and how AIE changes mechanical and thermal nociception. The authors found that AIE increased mechanical, but not thermal nociception, and an injection of an inflammatory agent did not produce changes in an ethanol-dependent manner. Physiologically, a variety of AIE-specific effects were found in PL neuron firing at BLA synapses, suggestive of AIE-induced alterations in neurotransmission at BLA-PVIN synapses.

      Strengths:

      This was a comprehensive examination of the effects of AIE on this neural circuit, with an in-depth dissection of the various neuronal connections within the PL.

      Sex was included as a biological variable, yet there were little to no sex differences in AIE's effects, suggestive of similar adaptations in males and females.

    4. Reviewer #3 (Public review):

      Summary:

      Obray et al. investigate the long-lasting effects of adolescent intermittent ethanol (AIE) in rats, a model of alcohol dependence, on a neural circuit within the prefrontal cortex. The studies are focused on inputs from the basolateral amygdala (BLA) onto parvalbumin (PV) interneurons and pyramidal cells that project to the periaqueductal gray (PAG). The authors found that AIE increased BLA excitatory drive onto parvalbumin interneurons and increased BLA feedforward inhibition onto PAG-projecting neurons.

      Strengths:

      Fully powered cohorts of male and female rodents are used, and the design incorporates both AIE and an acute pain model. The authors used several electrophysiological techniques to assess synaptic strength and excitability from a few complimentary angles. The design and statistical analysis are sound, and the strength of evidence supporting synaptic changes following AIE results is solid.

      Weaknesses:

      (1) There is incomplete evidence supporting some of the conclusions drawn in this manuscript. The authors claim that the changes in feedforward inhibition onto pyramidal cells are due to the changes in parvalbumin interneurons, but evidence is not provided to support that idea. PV cells do not spontaneously fire action potentials spontaneously in slices (nor do they receive high levels of BLA activity while at rest in slices). It is possible that spontaneous GABA release from PV cells is increased after AIE but the authors did not report sIPSC frequency. Second, the authors did not determine that PV cells mediate the feedforward BLA op-IPSCs and changes following AIE (this would require manipulation to reduce/block PV-IN activity). This limitation in results and interpretation is important because prior work shows BLA-PFC feedforward IPSCs can be driven by somatostatin cells. Cholecystokinin cells are also abundant basket cells in PFC and have been recently shown to mediate feedforward inhibition from the thalamus and ventral hippocampus, so it's also possible that CCK cells are involved in the effects observed here.

      (2) The authors conclude that the changes in this circuit likely mediate long-lasting hyperalgesia, but this is not addressed experimentally. In some ways, the focused nature of the study is a benefit in this regard, as there is extensive prior literature linking this circuit with pain behaviors in alternative models (e.g., SNI), but it should be noted that these studies have not assessed hyperalgesia stemming from prior alcohol exposure. While the current studies do not include a causative behavioral manipulation, the strength of the association between BLA-PL-PAG function and hyperalgesia could be bolstered by current data if there were relationships detected between electrophysiological properties and hyperalgesia. Have the authors assessed this? In addition, this study is limited by not addressing the specificity of synaptic adaptations to the BLA-PL-PAG circuit. For instance, PL neurons send reciprocal projections to BLA and send direct projections to the locus coeruleus (which the authors note is an important downstream node of the PAG for regulating pain).

      (3) I have some concerns about methodology. First, 5-ms is a long light pulse for optogenetics and might induce action-potential independent release. Does TTX alone block op-EPSCs under these conditions? Second, PV cells express a high degree of calcium-permeable AMPA receptors, which display inward rectification at positive holding potentials due to blockade from intracellular polyamines. Typically, this is controlled/promoted by including spermine in the internal solution, but I do not believe the authors did that. Nonetheless, the relatively low A/N ratios for this cell type suggest that CP-AMPA receptors were not sampled with the +40/+40 design of this experiment, raising concerns that the majority of AMPA receptors in these cells were not sampled during this experiment. Finally, it should be noted that asEPSC frequency can also reflect changes in a number of functional/detectable synapses. This measurement is also fairly susceptible to differences in inter-animal differences in ChR2 expression. There are other techniques for assessing presynaptic release probability (e.g., PPR, MK-801 sensitivity) that would improve the interpretation of these studies if that is intended to be a point of emphasis.

      (4) In a few places in the manuscript, results following voluntary drinking experiments (especially Salling et al. and Sicher et al.) are discussed without clear distinction from prior work in vapor models of dependence

      (5) Discussion (lines 416-420). The authors describe some differing results with the literature and mention that the maximum current injection might be a factor. To me, this does not seem like the most important factor and potentially undercuts the relevance of the findings. Are the cells undergoing a depolarization block? Did the authors observe any changes in the rheobase or AP threshold? On the other hand, a more likely difference between this and previous work is that the proportion of PAG-projecting cells is relatively low, so previous work in L5 likely sampled many types of pyramidal cells that project to other areas. This is a key example where additional studies by the current group assessing a distinct or parallel set of pyramidal cells would aid in the interpretation of these results and help to place them within the existing literature. Along these lines, PAG-projecting neurons are Type A cells with significant hyperpolarization sag. Previous studies showed that adolescent binge drinking stunts the development of HCN channel function and ensuing hyperpolarization sag. Have the authors observed this in PAG-projecting cells? Another interesting membrane property worth exploring with the existing data set is the afterhyperpolarization / SK channel function.

    1. eLife Assessment

      This study provides valuable advances in our understanding of how inputs from multiple sources can impact the physiology of motor neurons during the process of multisensory integration. Specifically, the authors show how streams of auditory and principally visual information modulate the physiology of Mauthner neurons in goldfish, thus allowing the different senses to influence escape behavior. Supporting evidence is generally convincing, although material reporting the direct control of behavior is less representative of the data.

    2. Reviewer #1 (Public Review):

      Otero-Coronel et al. address an important question for neuroscience - how does a premotor neuron capable of directly controlling behavior integrate multiple sources of sensory inputs to inform action selection? For this, they focused on the teleost Mauthner cell, long known to be at the core of a fast escape circuit. What is particularly interesting in this work is the naturalistic approach they took. Classically, the M-cell was characterized, both behaviorally and physiologically, using an unimodal sensory space. Here the authors make the effort (substantial!) to study the physiology of the M-cell taking into account both the visual and auditory inputs. They performed well-informed electrophysiological approaches to decipher how the M-cell integrates the information of two sensory modalities depending on the strength and temporal relation between them.

      The empirical results are convincing and well-supported. The manuscript is well-written and organized. The experimental approaches and the selection of stimulus parameters are clear and informed by the bibliography. The major finding is that multisensory integration increases the certainty of environmental information in an inherently noisy environment.

    3. Reviewer #2 (Public Review):

      In this manuscript, Otero-Coronel and colleagues use a combination of acoustic stimuli and electrical stimulation of the tectum to study MSI in the M-cells of adult goldfish. They first perform a necessary piece of groundwork in calibrating tectal stimulation for maximal M-cell MSI, and then characterize this MSI with slightly varying tectal and acoustic inputs. Next, they quantify the magnitude and timing of FFI that each type of input has on the M-cell, finding that both the tectum and the auditory system drive FFI, but that FFI decays more slowly for auditory signals. These are novel results that would be of interest to a broader sensory neuroscience community. By then providing pairs of stimuli separated by 50ms, they assess the ability of the first stimulus to suppress responses to the second, finding that acoustic stimuli strongly suppress subsequent acoustic responses in the M-cell, that they weakly suppress subsequent tectal stimulation, and that tectal stimulation does not appreciably inhibit subsequent stimuli of either type. Finally, they show that M-cell physiology mirrors previously reported behavioural data in which stronger stimuli underwent less integration.

      The manuscript is generally well-written and clear. The discussion of results is appropriately broad and open-ended. It's a good document. Our major concerns regarding the study's validity are captured in the individual comments below. In terms of impact, the most compelling new observation is the quantification of the FFI from the two sources and the logical extension of these FFI dynamics to M-cell physiology during MSI. It is also nice, but unsurprising, to see that the relationship between stimulus strength that MSI is similar for M-cell physiology to what has previously been shown for behavior. While we find the results interesting, we think that they will be of greatest interest to those specifically interested in M-cell physiology and function.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Minor Concern (Original Comment 1):

      “We think that this is sufficient to address our concern. Some citations may be in order to underpin the new text.”

      We appreciate the reviewer’s assessment that the revised text clarifies the complexity of the upstream circuitry beyond the retina, including inputs from the thalamus. As recommended, we have now included additional citations in the revised manuscript to support these points.

      Major Concern (Original Comment 5):

      “We do not feel that this important concern has been addressed. The stats are definitively negative. There is no statistical evidence from these data that multisensory integration is occurring in this assay. The anesthesia, paralysis, and low n may provide explanations for this negative result, but it is still a negative result (p=0.5269). To show two examples of multisensory integration for subthreshold stimuli fits the narrative, but this result is not supported. Examples where individual stimuli caused APs (and combined stimuli did not) also occurred, presumably, and at a rate that is statistically indistinguishable to the examples shown in Figure 5. As such, if results from this assay are going to be in the manuscript, acoustic-only and tectum-only examples should be shown as well, although they would not fit the narrative. To be meaningful, this experiment would have to show that multisensory integration is happening in this circuit. Frustrating though it must be, the experiment has given a negative result to that question.”

      We understand the reviewer’s concern regarding Figure 5C and the firing of action potentials (APs) in response to multisensory stimuli. We acknowledge that our assay is not suited to answer this question definitively and that our results do not provide statistical support for this hypothesis. In response, we have removed the examples previously shown in Figure 5C, along with the related description in the Results section (lines 420–426), to avoid implying unsupported integration in suprathreshold conditions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Point 1: While the manuscript is methodologically sound, the following aspects of image acquisition and data analysis need to be clarified to ensure replicability and reproducibility. The authors state that the sample is a "population-derived adult lifespan sample", the lack of demographic information makes it impossible to know if the sample is truly representative. Though this may seem inconsequential, education may impact both cognitive performance and functional activation patterns. Moreover, the authors do not report race/ethnicity in the manuscript. This information is essential to ensure representativeness in the sample. It is imperative that barriers to study participation within minoritized groups are addressed to ensure rigor and reproducibility of findings.

      First, the section Methods-Participants has been updated to refer readers to a prior article where the sample’s demographics are broken down into nine decile age groups (see Wu et al. 2023 Table 1), including information about their education levels. Secondly, we have updated the Data Availability section text to indicate that all Cam-CAN IDs are included in the available OSF datasets, allowing anyone to verify additional participant demographics described in the Cam-CAN protocol article (Shafto et al., 2014). Third, we have updated the Participants section text to refer to another prior study that reported on the representativeness of the Cam-CAN sample indicating that at least some elements of the sample have been independently deemed as representative (e.g., Sex).

      Page-24

      “A healthy population-derived adult lifespan human sample (N = 223; ages approximately uniformly distributed from 19 - 87 years; females = 112; 50.2%) was collected as part of the Cam-CAN study (Stage 3 cohort; Shafto et al., 2014). Participants were fluent English speakers in good physical and mental health, based on the Cam-CAN cohort’s exclusion criteria which includes poor mini mental state examination, ineligibility for MRI and medical, psychiatric, hearing or visual problems. Throughout analyses, age is defined at the Home Interview (Stage 1; Shafto et al., 2014). The study was approved by the Cambridgeshire 2 (now East of England–Cambridge Central) Research Ethics Committee and participants provided informed written consent. Further demographic information of the sample is reported in Wu et al. (2023) and is openly available (see section Data Availability) with a recent report indicating the representativeness of the sample across sexes (Green et al., 2018).”

      Page-30

      “Raw and minimally pre-processed MRI (i.e., from automatic analysis; Taylor et al., 2017) and behavioural data are available by submitting a data request to Cam-CAN (https://camcan-archive.mrc-cbu.cam.ac.uk/dataaccess/). The univariate and multivariate ROI data, and behavioural data, can be downloaded from the Open Science Framework, which includes Cam-CAN participant identifiers allowing the retrieval of any additional demographic data (https://osf.io/v7kmh), while the analysis code is available on GitHub.”

      Point 2: For the whole-brain analysis in which the ROIs were derived, the authors used a threshold-free cluster enhancement (TFCE; Smith & Nichols 2009). The methodological paper cited suggests that individuals' TCFE image should still be corrected for multiple comparisons using the following: "to correct for multiple comparisons, one [...] has to build up the null distribution (across permutations of the input data) of the maximum (across voxels) TFCE score, and then test the actual TFCE image against that. Once the 95th percentile in the null distribution is found then the TFCE image is simply thresholded at this level to give inference at the p < 0.05 (corrected) level." (Smith & Nichols, 2009). Although the authors mention that clusters were estimated using 2000 permutations, there is no mention of the TFCE image itself being thresholded. While this would impact the overall size of the ROIs used in the study, the remaining analyses are methodologically sound.

      We have updated the text to detail the t=1.97 (i.e., p = .05) threshold we applied before interpretation of the resultant TFCE images to the section: Experimental Design & Statistical Analysis. This threshold value can also be verified in the analytics code that is referenced on GitHub from the section Data Availability within the requisite toolbox functions: https://github.com/kamentsvetanov/CommonalityAnalysis/blob/main/code/ca_vba_tfce_threshold.m#L24 and https://github.com/kamentsvetanov/CommonalityAnalysis/blob/main/code/external/ca_matlab_tfce_transform.m

      Page-30

      “For whole-brain voxelwise analyses, clusters were estimated using threshold-free cluster enhancement (TFCE; Smith & Nichols 2009) with 2000 permutations and the resulting images were thresholded at a t-statistic of 1.97 before interpretation.”

      Point 3: The authors should consider moving the ROI section to results. The way the manuscript currently reads, the ROIs seem to be derived a priori as opposed to being derived from activation maps in the current study.

      After consideration of this point, we have decided to leave the methodological details regarding the definition of ROIs in the methods, to maintain the focus of the Results section. However, we have improved signposting in the results section to highlight that the ROIs were derived from the overlapped activation maps.

      Page-8

      “Crucially, two areas of the brain showed spatially-overlapping positive effects of age and performance, which is suggestive of an age-related compensatory response (Figure 2A yellow intersection). These were in bilateral cuneal cortex (Figure 2B magenta) and bilateral frontal cortex (Figure 2B brown), the latter incorporating parts of the middle frontal gyri and anterior cingulate. Therefore, based on traditional univariate analyses, these are two candidate regions for age-related functional compensation (Cabeza et al. 2013; 2018). Accordingly, we defined regions of interest within these two regions using the overlap activation maps (see section: ROIs) to be used for subsequent univariate and multivariate analysis.”

      Point 4: The manuscript can be strengthened by explaining why the authors chose a greedy search algorithm over a dynamic Bayesian model.

      The text is updated to refer to appropriateness of the computationally efficient greedy search implementation, due to the size of the fMRI cohort dataset.

      Page-28

      “The pattern weights specifying the mapping of data features to the target variable are optimized with a greedy search algorithm using a standard variational scheme (Friston et al., 2007) which was particularly appropriate given the large dataset.”

      Reviewer #2:

      Point 1: However, it might have been nice to see an analysis of a more crystallised intelligence task included too, as a contrast since this is an area that does not demonstrate such a decline (and perhaps continues to improve over aging).

      We (Samu et al., 2017) have previously investigated, but failed to find, univariate evidence for functional compensation in this cohort’s performance on a sentence comprehension task that is more closely aligned to a measure of crystallised intelligence. Based on the additional previous studies where we have applied these types of univariate and multivariate criteria of functional compensation (Morcom & Henson, 2018; Knights et al., 2021), we have consistently observed that the uni-/multivariate effects are in the same direction. Therefore, we would not initially expect a different conclusion here, where the univariate and multivariate effects suggest different outcomes. Notably, the univariate analysis approach in Samu et al. (2017) did differ from focusing on the age x behaviour interaction term here, so it could still be worth future investigation, but it does seem less likely that evidence of compensation would be observed than for fluid intelligence. However, as the Reviewer suggests, such a task may make another good contrast to show evidence against the existence of functional compensation (as in Morcom & Henson, 2018; Knights et al., 2021).

      Point 2: Figure 1B: Consider adding coefficients describing relationships to plots.

      Annotations of the coefficients have been added to Figure 1B:

      Point 3: Figure 2C. The scale of the axis for RSFA-Scales cuneal cortex ROI activations should be the same as the other 3 plots.

      Figure axes are updated such that ROIs are on matching scales, according to whether data were RSFA-scaled or not.

      Point 4: Figure 2C. Adding in the age ranges for each of the three groups following the tertile split may be informative to the reader.

      The age group tertile definition used for Figure 2C visualisations is now added to the Figure description.

      Page-10

      “Figure 2. Univariate analysis. (A) Whole-brain effects of age and performance. Age (green) and performance (red) positively predicted unique aspects of increased task activation, with their spatial overlap (yellow) being overlaid on a template MNI brain, using p < 0.05 TFCE. (B) Intersection ROIs. A bilateral cuneal (magenta) and frontal cortex (brown) ROI were defined from voxels that showed a positive and unique effect of both age and performance (yellow map in Figure 2A). (C) ROI Activation. Activation (raw = left; RSFA-scaled = right) is plotted against behavioural performance based on a tertile split between three age groups (19-44, 45-63 & 64-87 years).”

      Reviewer #3:

      Point 1: [Public Review] 1) I don't quite follow the argumentation that compensatory recruitment would need to show via non-redundant information carried by any given non-MDN region (cf. p14). Wouldn't the fact that a non-MDN region carries task-related information be sufficient to infer that it is involved in the task and, if activated increasingly with increasing age, that its stronger recruitment reflects compensation, rather than inefficiency or dedifferentiation? Put differently, wouldn't "more of the same" in an additional region suffice to qualify as compensation, as compared to the "additional information in an additional region" requirement set by the authors? As a consequence, in my honest opinion, showing that decoding task difficulty from non-MDN ROIs works better with higher age would already count as evidence for compensation, rather than asking for age-related increases in decoding boosts obtained from adding such ROIs. It would be interesting to see whether the arguably redundant frontal ROI would satisfy this less demanding criterion. At any rate, it seems useful to show whether the difference in log evidence for the real vs. shuffled models is also related to age.

      We agree with the logic for conducting a weaker assessment of functional compensation whereby a brain region does not necessarily have to provide a unique contribution beyond that of the ordinarily activated task-relevant network. However, although non-unique recruitment is predicted by a compensation theory, it can also be explained by a nonspecific mechanism that recruits multiple regions in tandem. In contrast, unique additional recruitment is compatible with compensation but not with nonspecific recruitment. In this article, and those prior (Morcom & Henson, 2018; Knights et al. 2021), we have also deliberately avoided using the specific kind of analysis proposed (i.e., testing for an effect of age on differential log evidence) because these would involve applying statistical tests directly to the log evidence, a variable that is already a statistical test output.

      Nevertheless, temporarily putting these caveats aside, we did run the suggested test. Results from multiple regression showed that using log evidence from frontal cortex models still did not meet this less demanding criterion for functional compensation as there was an effect of age in the opposite direction to that expected by functional compensation: there was a significant negative effect of age (t(218) = -7.95, p = < .001) indicating that as age increased, the difference in log evidence decreased. This effect is visualised below for transparency, but we preferred not to add this information to the article because we do not wish to encourage using this kind of analysis for the reason mentioned above. Thus, although our main multivariate test of interest is stringent, the additional step of mapping log evidence back to the boost-likelihood categories (e.g., boost vs. no difference to model performance) lends itself to the more appropriate logistic regression statistical approach.

      Author response image 1.

      Negative effect of age on MVB log evidence model outcomes for frontal cortex.

      A different approach that could be taken to assess a more lenient definition of functional compensation would be to analyse the effects of age on the spread of multivariate responses predicting task difficulty (i.e., standard deviation of fitted MVB voxel weights; also see Morcom & Henson, 2018; Knights et al., 2021) specifically from models that only include the candidate ‘compensation’ ROIs.

      Accordingly, these analyses and their discussion have been added to the article. To summarise, these analyses showed that (1) the frontal cortex still did not show evidence of functional compensation (i.e., a negative effect of age like in Morcom & Henson, 2018) and (2) no effect of age on the cuneal ROI, implying that the original model comparison approach (i.e., Figure 2C in the manuscript now) can provide more sensitivity for detecting evidence of functional compensation (perhaps because of the importance of including task-relevant network responses when building decoding models).

      Page-15

      “As a final analysis, we also tested a more lenient definition of functional compensation, whereby the multivariate contribution from the “compensation ROI” does not necessarily need to be above and beyond that of the task-relevant network (Morcom & Henson, 2018; Knights et al., 2021). To do this, we again assessed whether age was associated with an increase in the spread (standard deviation) of the weights over voxels, for smaller models containing only the cuneal or frontal ROI. This tested whether increased age led to more voxels carrying substantial information about task difficulty, a pattern predicted by functional compensation (but also consistent with non-specific additional recruitment). In this case, the results of this test did not support functional compensation, as there was no effect detected for the cuneal cortex and even a negative effect of age for the frontal cortex where the spread of the information across voxels was lower for older age (Figure 3C; Table 2).”

      Page-21

      “The age- and performance-related activation in our frontal region satisfied the traditional univariate criteria for functional compensation, but our multivariate (MVB) model comparison analysis showed that additional multivariate information beyond that in the MDN was absent in this region, which is inconsistent with the strongest definition of compensation. In fact, the results from the spread analysis showed that as age increased, this frontal area processed less, rather than more, multivariate information about the cognitive outcome (Figure 3C) as previously observed in two (memory) tasks for a comparable ROI within the same Cam-CAN cohort (Morcom & Henson, 2018).”

      Page-24

      “This said, univariate criteria for functional compensation will continue to play a role in hypothesis testing. For instance, the over-additive interaction observed in the cuneal cortex - where the increase in activity with better performance is more pronounced in older adults - offers stronger evidence of compensation compared to the simple additive effect of age and performance observed in the frontal cortex (Figure 2C). So far, the two studies that have combined these rigorous univariate, behavioral and multivariate approaches to assess functional compensation (i.e., Knights et al., 2021; the present study) have generally found converging evidence regardless of the method used. However, it is important to note that the MVB approach uniquely shifts the focus from individual differences to the specific task-related information that compensatory neural activations are assumed to carry and provides a specific test of region- (or network-) unique information. With further studies, it may also be that multivariate approaches prove more sensitive for detecting compensation effects than when using mean responses over voxels (e.g., Friston et al., 1995) particularly since over-additive effects are challenging to observe because compensatory effects are typically ‘partial’ and do not fully restore function (for review see Scheller et al., 2014; Morcom & Johnson, 2015). Within the multivariate analysis options themselves, it is also interesting to highlight that the stringent MVB boost likelihood analysis could detect functional compensation unlike the more lenient analysis focusing on the spread of MVB voxel weights. This suggests the importance of including task-relevant network responses when building decoding models to assess compensation.”

      Page-32

      “Alongside the MVB boost analysis, we also included an additional measure using the spread (standard deviation) of voxel classification weights (Morcom & Henson, 2018). This measure indexes the absolute amplitude of voxel contributions to the task, reflecting the degree to which multiple voxels carry substantial task-related information. When related to age this can serve as a multivariate index of information distribution, unlike univariate analyses. However, it is worth highlighting that even if an ROI shows an effect of age on this spread measure, such an effect could instead be explained by a non-specific mechanism that represents the same information in tandem across multiple regions (rather than reflecting compensation) as seen previously (Knights et al., 2021; also see Morcom & Johnson, 2015). Thus, it is the MVB boost analysis that is the most compelling assessment of functional compensation because it can directly detect novel information representation.”

      Point 2: [Public Review] 2) Relatedly, does the observed boost in decoding by adding the cuneal ROI (in older adults) really reflect "additional, non-redundant" information carried by this ROI? Or could it be that this boost is just a statistical phenomenon that is obtained because the cuneus just happens to show a more clear-cut, less noisy difference in hard vs. easy task activation patterns than does the MDN (which itself may suffer from increased neural inefficiency in older age), and thus the cuneaus improves decoding performance without containing additional (novel) pieces of information (but just more reliable ones)? If so, the compensation account could still be maintained by reference to the less demanding rationale for what constitutes compensation laid out above.

      We agree that this is a possibility and have added this as an additional explanation to the Discussion. We have also discussed why we think it is a less likely possibility, but do concede that it cannot be ruled out currently.

      Page-20

      “Another possibility is that the age-related increases in fMRI activations (for hard versus easy) in one or both of our ROIs do not reflect greater fMRI signal for hard problems in older than younger people, but rather lower fMRI signal for easy problems in the older. Without a third baseline condition, we cannot distinguish these two possibilities in our data. However, a reduced “baseline” level of fMRI signal (e.g., for easy problems) in older people is consistent with other studies showing an age-related decline in baseline perfusion levels, coupled with preserved capacity of cerebrovascular reactivity to meet metabolic demands of neuronal activity at higher cognitive load  (Calautti et al., 2001; Jennings et al., 2005). Though age-related decline in baseline perfusion occurs in the cuneal cortex (Tsvetanov et al., 2021), the brain regions showing modulation of behaviourally-relevant Cattell fMRI activity by perfusion levels did not include the cuneal cortex (Wu et al., 2023). This suggests that the compensatory effects in the cuneus are unlikely to be explained by age-related hypo-perfusion, consistent with the minimal effect here of adjusting for RSFA (Figure 2C).

      One final possibility is whether the observed boost in decoding from adding the cuneal ROI simply reflects less noisy task-related information (i.e., a better signal-to-noise ratio (SNR)) than the MDN and, consequently, the boosted decoding is the result of more resilient patterns of information (rather than the representation of additional information) based on a steeper age-related decline of SNR in the MDN. Overall then, as none of the explanations above agree with all aspects of the results, to functionally explain the role of the cuneal cortex in this task would require further investigation.”

      Point 3: [Public Review] 3) On page 21, the authors state that "...traditional univariate criteria alone are not sufficient for identifying functional compensation." To me, this conclusion is quite bold as I'd think that this depends on the unvariate criterion used. For instance, it could be argued that compensation should be more clearly indicated by an over additive interaction as observed for the relationship of cuneal activity with age and performance (i.e., the activity increase with better performance becomes stronger with age), rather than by an additive effect of age and performance as observed for the prefrontal ROI (see Fig. 2C). In any case, I'd appreciate it if the authors discussed this issue and the relationship between univariate and multivariate results in more detail (e.g. how many differences in sensitivity between the two approaches have contributed), in particular since the sophisticated multivariate approach used here is not widely established in the field yet.

      We have now considered this point further in a section of the Discussion (which is merged with points 1 & 2 above) about the relevance and distinction of univariate / multivariate criteria for functional compensation. As described in text below, whilst we agree that univariate / behavioural approaches have a role in testing functional compensation, we still view the MVB boost analysis to be a particularly compelling approach for assessing this theory.

      Page-22

      “This said, univariate criteria for functional compensation will continue to play a role in hypothesis testing. For instance, the over-additive interaction observed in the cuneal cortex - where the increase in activity with better performance is more pronounced in older adults - offers evidence of compensation compared to the simple additive effect of age and performance observed in the frontal cortex (Figure 2C). However, the conclusions that can be drawn from age-related differences in cross-sectional associations of brain and behaviour are limited, mainly because individual performance differences are largely lifespan-stable (see Lindenberger et al., 2011; Morcom & Johnson, 2015). So far, the two studies that have combined these univariate-behavioral and multivariate approaches to assess functional compensation (i.e., Knights et al., 2021; the present study) have generally found converging evidence regardless of the method used. However, it is important to note that the MVB approach uniquely shifts the focus from individual differences to the specific task-related information that compensatory neural activations are assumed to carry. With further studies, it may also be that multivariate approaches prove more sensitive for detecting compensation effects than when using mean responses over voxels (e.g., Friston et al., 1995) particularly since over-additive effects are challenging to observe because compensatory effects are typically ‘partial’ and do not fully restore function. Within the multivariate analysis options themselves, it is also interesting to highlight that the stringent MVB boost likelihood analysis could detect functional compensation unlike the more lenient analysis focusing on the spread of MVB voxel weights. This suggests the importance of including task-relevant network responses when building decoding models to asses compensation.”

      Point 4: [Public Review] 4) As to the exclusion of poorly performing participants (see p24): If only based on the absolute number of errors, wouldn't you miss those who worked (overly) slowly but made few errors (possibly because of adjusting their speed-accuracy tradeoff)? Wouldn't it be reasonable to define a criterion based on the same performance measure (correct - incorrect) as used in the main behavioural analyses?

      This is a good point, though if we were to exclude participants using a chance level exclusion rate based on the formulae used for measuring behavioural performance, this removes identical subjects to those originally excluded. Based on this, the text has been updated to reflect this more parsimonious approach for defining exclusion criteria.

      Page-25

      “In a block design, participants completed eight 30-second blocks which contained a series of puzzles from one of two difficulty levels (i.e., four hard and four easy blocks completed in an alternating block order; Figure 1A). The fixed block time allowed participants to attempt as many trials as possible. Therefore, to balance speed and accuracy, behavioural performance was measured by subtracting the number of incorrect from correct trials and averaging over the hard and easy blocks independently (i.e., ((hard correct - hard incorrect) + (easy correct - easy incorrect))/2; Samu et al., 2017). For assessing reliability and validity, behavioural performance (total number of puzzles correct) was also collected from the same participants during a full version of the Cattell task (Scale 2 Form A) administered outside the scanner at Stage 2 of the Cam-CAN study (Shafto et al., 2014). Both the in- and out-of-scanner measures were z-scored. We excluded participants (N = 28; 17 females) who performed at chance level ((correct + incorrect) / incorrect < 0.5) on the fMRI task, leading to the same subset as reported in Samu et al. (2017).”

      Point 5: [Public Review] 5) Did the authors consider testing for negative relationships between performance and brain activity, given that there is some literature arguing that neural efficiency (i.e. less activation) is the hallmark of high intelligence (i.e. high performance levels in the Cattell task)? If that were true, at least for some regions, the set of ROIs putatively carrying task-related information could be expanded beyond that examined here. If no such regions were found, it would provide some evidence bearing on the neural efficiency hypothesis.

      No, we did not test for negative relationships between performance and brain activity in this study. However, In Wu et al. (2023) we did specifically test for this and neither of the relevant results reported in section 3.3.1 (i.e., unique relationship between activity and performance) nor section 3.3.2 (i.e., age-related relationship between activity and performance) showed the queried direction of effects. Note that the negative effect in section 3.3.2 (Age U Performance) is a more unique suppression effect representing a positive relationship between performance and activity where this becomes stronger as age is added to the model.

      Point 6: [Recommendations for the authors] 1) Page 26: It is not quite clear how the authors made sure their age and performance covariates functioned as independent regressors in the univariate group-level GLM, given the correlation between age and performance (i.e. shared variance).

      We included age and performance as covariates (of the age x performance effect of interest) by simply including these as independent regressors in the group-level GLM design matrix in addition to the interaction term (i.e., activity ~ age*performance + covariates equivalent to activity ~ age:performance + age + performance + covariates; Wilkinson & Roger 1973 notation), allowing us to examine the unique variance explained by each predictor (Table 1 and Table 2) and to control for their shared variance.

      We should note that while the GLM approach we used accounts for unique and shared effects, it does not explicitly report shared effects in its standard output. To directly examine shared variance, one would need to employ commonality analysis. For reference, results from a commonality analysis on this task have been previously reported in Wu et al. (2023).

      Prompted by this point, we have made some further minor improvements to help ensure our methodological steps are reproducible, as highlighted below.

      Page-30

      “Continuous age and behavioural performance variables were standardised and treated as linear predictors in multiple regression throughout the behavioural (Figure 1B), wholebrain voxelwise (Figure 1C/2A), univariate (Table 1; Figure 1B/2B) and MVB (Table 2; Figure 3) analyses. Throughout, sex was included as a covariate. The models, including interaction terms, can be described, according to Wilkinson & Roger’s (1973) notation, as activity ~ age * performance + covariates (which is equivalent to activity ~ age:performance + age + performance + covariates), allowing us to examine the unique variance explained by each predictor (Table 1) and to control for their shared variance. For whole-brain voxelwise analyses, clusters were estimated using threshold-free cluster enhancement (TFCE; Smith & Nichols 2009) with 2000 permutations and the resulting images were thresholded at a t-statistic of 1.97 before interpretation. Bonferroni correction was applied to a standard alpha = 0.05 based on the two ROIs (cuneal and frontal) that were examined. For Bayes Factors, interpretation criteria norms were drawn from Jarosz & Wiley (2014).”

      Point 7: [Recommendations for the authors] 2) Figure 3: I suggest changing the subheading in panel B to "Joint vs. MDN-only Model," in line with the wording in the main text.

      The subheading of Figure 3B is updated as suggested to `Joint vs. MDN-only Model`.

      Point 8: [Recommendations for the authors] 3) In Figures 1C and 2A, MNI z coordinates should be added to the section views. The appreciation of Figure 2B could be enhanced by adding some rendering with a saggital (medial and/or lateral) view.

      The slice mosaics in Figure 1C and 2A are now updated with each slice’s MNI Z coordinates and mentioned in the figure descriptions.

      Point 9: [Recommendations for the authors] 4) Page 7 (l. 135): What exactly is meant by "lateral occipital temporal cortex"?

      The text is updated to specify the anatomical landmarks that were used for guidance when referring to activation within the lateral occipital temporal cortex, based on ROI criteria definitions used in Knights, Mansfield et al. (2021):

      Page-7 Line-135:

      “Additional activation was observed bilaterally in the inferior/ventral and lateral occipital temporal cortex (i.e., a cluster around the lateral occipital sulcus that extended anteriorly beyond the anterior occipital sulcus), likely due to the visual nature of the task.”

      Point 10: [Recommendations for the authors] 5) On p18ff. (ll. 259-318) the authors discuss in quite some detail how the age-related decoding boost seen with the cuneus ROI can be functionally explained, but it seems like none of the explanations agrees with all aspects of the results. While this is not a major problem for the paper, it may be advisable if this part of the discussion ends with a clearer statement that this issue is not fully solved yet and provides material for future research.

      A more direct sentence has been added to make it clear that future investigation will be needed to explain the role of the cuneal cortex here.

      Page-20 Line-322:

      “Another possibility is that the age-related increases in fMRI activations (for hard versus easy) in one or both of our ROIs do not reflect greater fMRI signal for hard problems in older than younger people, but rather lower fMRI signal for easy problems in the older. Without a third baseline condition, we cannot distinguish these two possibilities in our data. However, a reduced “baseline” level of fMRI signal (e.g., for easy problems) in older people is consistent with other studies showing an age-related decline in baseline perfusion levels, coupled with preserved capacity of cerebrovascular reactivity to meet metabolic demands of neuronal activity at higher cognitive load  (Calautti et al., 2001; Jennings et al., 2005). Though age-related decline in baseline perfusion occurs in the cuneal cortex (Tsvetanov et al., 2021), the brain regions showing modulation of behaviourally-relevant Cattell fMRI activity by perfusion levels did not include the cuneal cortex (Wu et al., 2021). This suggests that the compensatory effects in the cuneus are unlikely to be explained by age-related hypo-perfusion, consistent with the minimal effect here of adjusting for RSFA (Figure 2C). Overall then, as none of the explanations above agree with all aspects of the results, to functionally explain the role of the cuneal cortex in this task will require further investigation.”

      Point 11: [Recommendations for the authors] 6) The threshold choice for Bayesian log evidence (> 3) should be motivated in some more detail, rather than just pointing to a book reference, as there is no established convention in the field, the choice may depend on the type of data and/or analysis, and a sizeable part of the readership may not be deeply familiar with the particular Bayesian approach used here.

      Text is updated to further clarify our motivation for using the log evidence BF>3 criterion:

      Page-29

      “The outcome measure was the log evidence for each model (Morcom & Henson, 2018; Knights et al., 2021). To test whether activity from an ROI is compensatory, we used an ordinal boost measure (Morcom & Henson, 2018; Knights et al., 2021) to assess the contribution of that ROI for the decoding of task-relevant information (Figure 3B). Specifically, Bayesian model comparison assessed whether a model that contains activity patterns from a compensatory ROI and the MDN (i.e., a joint model) boosted the prediction of task-relevant information relative to a model containing the MDN only. The compensatory hypothesis predicts that the likelihood of a boost to model decoding will increase with older age. The dependent measure, for each participant, was a categorical recoding of the relative model evidence to indicate the outcome of the model comparison. The three possible outcomes were: a boost to model evidence for the joint vs. MDN-only model (difference in log evidence > 3), ambiguous evidence for the two models (difference in log evidence between -3 to 3), or a reduction in evidence for the joint vs. MDN-only model (difference in log evidence < -3).These values were selected because a log difference of three corresponds to a Bayes Factor of 20, which is generally considered strong evidence (Lee & Wagenmakers, 2014). Further, with uniform priors, this chosen criterion (Bayes Factor > 3) corresponds to a p-value of p<~.05 (since the natural logarithm of 20 equals three, as evidence for the alternative hypothesis).”

      Point 12: [Recommendations for the authors] 7) Adding page numbers would be helpful.

      Page numbers have been added to the manuscript file – apologies for this oversight.

      References

      Green, E., Bennett, H., Brayne, C., & Matthews, F. E. (2018). Exploring patterns of response across the lifespan: The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study. BMC Public Health18, 1-7.

      Knights, E., Mansfield, C., Tonin, D., Saada, J., Smith, F. W., & Rossit, S. (2021). Hand-selective visual regions represent how to grasp 3D tools: brain decoding during real actions. Journal of Neuroscience41(24), 5263-5273.

      Samu, D., Campbell, K. L., Tsvetanov, K. A., Shafto, M. A., & Tyler, L. K. (2017). Preserved cognitive functions with age are determined by domain-dependent shifts in network responsivity. Nature communications, 8(1), 14743.

      Shafto, M. A., Tyler, L. K., Dixon, M., Taylor, J. R., Rowe, J. B., Cusack, R., ... & Cam-CAN. (2014). The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC neurology14, 1-25.

      Wu, S., Tyler, L. K., Henson, R. N., Rowe, J. B., & Tsvetanov, K. A. (2023). Cerebral blood flow predicts multiple demand network activity and fluid intelligence across the adult lifespan. Neurobiology of aging121, 1-14.

    2. eLife Assessment

      This study provides an important advancement of knowledge by showing neural functional compensation in the brains of healthy older adults completing a fluid-intelligence task. Validated whole-brain voxel-wide analyses and multivariate Bayesian approaches provide compelling evidence that supports the claims of the authors. The work delivers methods for quantifying reserve and compensation in future studies and will be of interest to researchers in the field of the neuroscience of healthy aging.

    3. Reviewer #1 (Public Review):

      This work addresses how to quantify functional compensation throughout the aging process and identifies brain regions that engage in compensatory mechanisms during the Cattell task, a measure of fluid cognition. The authors find that regions of the frontal cortex and cuneus showed unique effects of both age and performance. Interestingly, these two regions demonstrated differential activation patterns taking into account both age and performance. Specifically, the researchers found that the relationship between performance and activation in the cuneal ROI was strongest in older adults, however, this was not found in younger adults. These findings suggest that specifically within the cuneus, greater activation is needed by older adults to maintain performance, suggestive of functional compensation.

      The conclusions derived from the study are well supported by the data. The authors validated the use of the in-scanner Cattell task by demonstrating high reliability in the same sample with the standard out-of-scanner version. Some strengths of the study include the large sample size and wide age range of participants. The authors use a stringent Bayes factor of 20 to assess the strength of evidence. The authors used a whole-brain approach to define regions of interest (ROIs) based on activation patterns that were jointly related to age and performance. Overall, the methods are technically sound and support the authors' conclusions.

      Comment from Reviewing Editor: In the revised manuscript, the authors have addressed the weaknesses previously identified by reviewer 1.

    4. Reviewer #2 (Public Review):

      This work by Knights et al., makes use of the Cam-CAN dataset to investigate functional compensation during a fluid processing task in older adults, in a fairly large sample of approximately 200 healthy adults ranging from 19 to 87. Using univariate methods, the authors identify two brain regions in which activity increases as a function of both age and performance and conduct further investigations to assess whether the activity of these regions provides information regarding task difficulty. The authors conclude that the cuneal cortex - a region of the brain previously implicated in visual attention - shows evidence of compensation in older adults.

      The conclusions of the paper are well supported by the data, and the authors use appropriate statistical analyses. The use of multivariate methods over the last 20 years has demonstrated many effects that would have been missed using more traditional univariate analysis techniques. The data set is also of an appropriate size, and as the authors note, fluid processing is an extremely important domain in the field of cognition in aging, due to its steep decline over aging.

      Comment from Reviewing Editor: It would have been nice to see an analysis of a more crystallised intelligence task included too, as a contrast since this is an area that does not demonstrate such a decline (and perhaps continues to improve over aging). This comment does not take away the important contributions of the manuscript.

    5. Reviewer #3 (Public Review):

      This neuroimaging study investigated how brain activity related to visual pattern-based reasoning changes over the adult lifespan, addressing the topic of functional compensation in older age. To this end, the authors employed a version of the Cattell task, which probes visual pattern recognition for identifying commonalities and differences within sets of abstract objects in order to infer the odd object among a given set. Using a state-of-the-art univariate analysis approach on fMRI data from a large lifespan sample, the authors identified brain regions in which the activation contrast between hard and easy Cattell task conditions was modulated by both age and performance. Regions identified comprised prefrontal areas and bilateral cuneus. Applying a multivariate decoding approach to activity in these regions, the authors went on to show that only in older adults, the cuneus, but not the prefrontal regions, carried information about the task condition (hard vs. easy) beyond that already provided by activity patterns of voxels that showed a univariate main effect of task difficulty. This was taken as compelling evidence for task-specific compensatory activity in the cuneus in advanced age.

      The study is well-motivated and well-written. The authors used appropriate, rigorous methods that allowed them to control for a range of possible confounds or alternative explanations. Laudable aspects include the large sample with a wide and even age distribution, the validation of the in-scanner task performance against previous results obtained with a more standard version outside the scanner, and the control for vascular age-related differences in hemodynamic activity via a BOLD signal amplitude measure obtained from a separate resting-state fMRI scan. Overall, the conclusions are well-supported by the data.

      Comment from Reviewing Editor: The revised manuscript has addressed the points raised during the review of the original submission.

    1. eLife Assessment

      This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. Convincing evidence shows the involvement of m6A in DNA based on single cell imaging and mass spec data. The authors present evidence that the m6A signal does not result from bacterial contamination or RNA, but the text does not make this overly clear.

    2. Reviewer #1 (Public review):

      Summary:

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPR knockout screen.

      Strengths:

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 6mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized.

      Weaknesses:

      This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage.

      Strengths:

      In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest.

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil.

      They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion.

      Weaknesses:

      Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL3-6mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion.

    4. Reviewer #3 (Public review):

      Summary:

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.

      Strengths:

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.

      Weaknesses:

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings.

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma?

      (3) The single cell imaging of 6mA in various cells is nice. The results are confirmed by mass spec as an orthogonal approach. Another orthogonal and quantitative approach to assessing 6mA levels would be PacBio. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficulty in assessing 6mA accurately in mammals acknowledged throughout.

    5. Author response:

      eLife Assessment <br /> This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. The presented evidence for the involvement of m6A in DNA is incomplete and requires further validation with orthogonal approaches to conclusively show the presence of 6mA in the DNA and exclude that the source is RNA or bacterial contamination. 

      We thank the editors for recognizing the importance of our work and the relevance of METTL3 in DNA repair. However, we wholly disagree with the second sentence in the eLife assessment, and we want to clarify why our evidence for the involvement of 6mA in DNA is complete.  

      The identification of 6mA in DNA, upon DNA damage, is based first on immunofluorescence observations using an anti-m6A antibody. In this setting, removal of RNA with RNase treatment fails to reduce the 6mA signal, excluding the possibility that the source of signal is RNA. In contrast, removal of DNA with DNase treatment removes all 6mA signal, strongly suggesting that the species carrying the N6-methyladenosine modification is DNA (Figure 3D, E). Importantly, in Figure 3F, we provide orthogonal, quantitative mass spectrometry data that independently confirm this finding. Mass spectrometry-liquid chromatography of DNA analytes, conclusively shows the presence of 6mA in DNA upon treatment with DNA damaging agents and excludes that the source is RNA, based on exact mass. Reviewer #2 recognized the strengths of this approach to generate solid evidence for 6mA in DNA.

      Cells only show the 6mA signal when treated with DNA damaging agents, and the 6mA is absent from untreated cells (Figure 3D, E, F). This provides strong evidence that the 6mA signal is not a result of bacterial contamination in our cell lines. Moreover, our cell lines are routinely tested for mycoplasma contamination. It could be possible that stock solutions of DNA damaging agents may be contaminated, but this would need to be true for all individual drugs and stocks tested. The data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3G, H) provides strong evidence against bacterial contamination in our stocks.  

      In summary, we provide conclusive evidence, based on orthogonal methods, that the METTL3-dependent N6-methyladenosine modification is deposited in DNA, not RNA, in response to DNA damage. 

      Public Reviews: <br /> Reviewer #1 (Public review): <br /> Summary: 

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPER knockout screen. 

      Typo above: “CRISPER” should be “CRISPR”.

      Strengths: 

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 5mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized. 

      Typo above: “5mA” should be “6mA”.

      Weaknesses: <br /> This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.

      The increased presence of 6mA during DNA damage could result from methylation at the A base itself (within DNA) or from incorporation of pre-modified 6mA during DNA synthesis. Our data do not directly discriminate between these two mechanisms, and we will clarify this point in the discussion.

      Reviewer #2 (Public review): <br /> Summary: <br /> In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage. <br /> Strengths: <br /> In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest. 

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil. <br /> They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion. <br /> Weaknesses:<br /> Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.

      Our cell lines are routinely tested for bacterial contamination, specifically mycoplasma, and we plan to state this information in a revised version of the manuscript.

      Importantly, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on the presence of DNA damage and not caused by contamination in the cell lines (Figure 3D, E, F). While it could be possible that stock solutions of DNA damaging agents may be contaminated, this would need to be the case for all individual drugs and stocks tested that induce 6mA, which seems very unlikely. Finally, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3 G, H) provides strong evidence against bacterial contamination in our drug stocks.

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.

      While we agree that it could be important to understand the precise genomic location of 6mA in relation to DNA damage, this is outside the scope of the current study. Moreover, this exercise may prove unproductive. If 6mA is enriched in DNA at damage sites or as DNA is replicated, the genomic mapping of 6mA is likely to be stochastic. If stochastic, it would be impossible to obtain the read depth necessary to map 6mA accurately.

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL3-6mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion. 

      We believe this to be an excellent suggestion from Reviewer #2 but we are unable to perform the proposed experiment at this time. We encourage future studies to explore the rescue experiment.

      Reviewer #3 (Public review):

      Summary:

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.

      Strengths:

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.

      Weaknesses:

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings.

      We agree with the reviewer that the presence and potential function of 6mA in mammalian DNA has been debated. Importantly, the debate regarding the presence and quantity of 6mA in DNA has been previously restricted to undamaged, baseline conditions. In complete agreement with this notion, we do not detect appreciable levels of 6mA in untreated cells. We will revise the introduction to introduce the debate about 6mA in DNA. We, however, want to highlight that our study provides for the first time, convincing evidence (based on orthogonal methods) that 6mA is present in DNA in response to a stimulus, DNA damage.

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma?

      HT-29 is a cell line of colorectal origin and chemotherapeutic agents that introduce uracil and uracil derivatives in DNA, as those used in this study, are relevant for the treatment of colorectal cancer. As indicated above, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on DNA damage and not caused by a potential bacterial contamination (Figure 3D, E, F). Additionally, our cell lines are routinely tested for bacterial contamination, specifically mycoplasma.

      (3) The single-cell imaging of 6mA in various cells is nice but must be confirmed by orthogonal approaches. PacBio would provide an alternative and quantitative approach to assessing 6mA levels. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      We are confused by this point since an orthogonal approach to detect 6mA, mass spectrometry-liquid chromatography, was employed. This method does not use an antibody and confirms the increase of 6mA in DNA when cells were treated with DNA damaging agents. This data is presented in Figure 3F.

      It is sensible to hypothesize that the localization of 6mA is consistent with DNA replication (like uracil deposition). In this event, the genomic mapping of 6mA is likely to be stochastic. This would make quantification with PacBio sequencing difficult because it would be very challenging to achieve the appropriate read depth to call a modified base.

      Dot blots rely on an antibody and thus are not truly orthogonal to our immunofluorescence-based measurements. We preferred the mass spectrometry-liquid chromatography approach we took as a true orthogonal approach.

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.

      Based on this and the following comment, we are convinced that Reviewer #3 has overlooked two critical elements of our study:

      First, the immunofluorescence work presented in Figure 3, showing 6mA signal in response to DNA damage, uses cells that were pre-extracted to remove excess cytoplasmic RNA. This method is often used in immunofluorescence experiments of this kind. The pre-extraction method removes most of the cytoplasmic content, and the majority of the cytoplasmic m6A RNA signal. Supplementary Figure 3D shows cells that have not been pre-extracted prior to staining. These images show the cytoplasmic m6A signal is abundant if we do not perform the pre-extraction step.

      If the antibody used to label 6mA significantly reacted with unmodified adenine, we would expect a large signal in untreated or untreated and denatured conditions. In contrast, an increase in 6mA is not observed in either case.

      Second, the orthogonal approach we employed, mass spectrometry coupled with liquid chromatography, measures 6mA DNA analytes specifically by exact mass. This approach does not depend on an antibody and yields results consistent with those from the immunofluorescence experiments.

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficultly in assessing 6mA accurately in mammals acknowledged throughout.

      Typo above: “difficultly” should be “difficulty”.

      As discussed in response to prior comments, Figure 3 does provide two independent and orthologous methods that demonstrate 6mA presence in DNA specifically, and not RNA, in response to DNA damage. Complementary and orthogonal datasets are presented using either immunofluorescence microscopy or mass spectrometry-liquid chromatography of extracted DNA. The latter method does not rely on an antibody and can discriminate 6mA DNA versus RNA based on exact mass. We will revise the text to clarify that Figure 3F is a completely orthogonal approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors describe the construction of an extremely large-scale anatomical model of juvenile rat somatosensory cortex (excluding the barrel region), which extends earlier iterations of these models by expanding across multiple interconnected cortical areas. The models are constructed in such a way as to maintain biological detail from a granular scale - for example, individual cell morphologies are maintained, and synaptic connectivity is founded on anatomical contacts. The authors use this model to investigate a variety of properties, from cell-type specific targeting (where the model results are compared to findings from recent large-scale electron microscopy studies) to network metrics. The model is also intended to serve as a platform and resource for the community by being a foundation for simulations of neuronal circuit activity and for additional anatomical studies that rely on the detailed knowledge of cellular identity and connectivity.

      Strengths:

      As the authors point out, the combination of scale and granularity of their model is what makes this study valuable and unique. The comparisons with recent electron microscopy findings are some of the most compelling results presented in the study, showing that certain connectivity patterns can arise directly from the anatomical configuration, while other discrepancies highlight where more selective targeting rules (perhaps based on molecular cues) are likely employed. They also describe intriguing effects of cortical thickness and curvature on circuit connectivity and characterize the magnitude of those effects on different cortical layers.

      The detailed construction of the model is drawn on a wide range of data sources (cellular and synaptic density measures, neuronal morphologies, cellular composition measures, brain geometry, etc.) that are integrated together; other data sources are used for comparison and validation. This consolidation and comparison also represent a valuable contribution to the overall understanding of the modeled system.

      We thank the reviewer for the kind comments.

      Weaknesses:

      The scale of the model, which is a primary strength, also can carry some drawbacks. In order to integrate all the diverse data sources together, many specific decisions must be made about, for example, translating findings from different species or regions to the modeled system, or deciding which aspects of the system can be assumed to be the same and which should vary. All these decisions will have effects on the predicted results from the model, which could limit the types of conclusions that can be made (both by the others and by others in the community who may wish to use the model for their own work).

      We agree that this is a downside of the principle of biophysically detailed modeling that is best addressed by continuous refinement in collaboration with the community. We would like to once again invite any interested party to participate in this process.

      As an example, while it is interesting that broad brain geometry has effects on network structure (Figure 7), it is not clear how those effects are actually manifested. I am not sure if some of the effects could be due to the way the model is constructed - perhaps there may be limited sets of morphologies that fit into columns of particular thicknesses, and those morphologies may have certain idiosyncrasies that could produce different statistics of connectivities where they are heavily used. That may be true to biology, but it may also be somewhat artifactual if, for example, the only neurons in the library that fit into that particular part of the cortex differ from the typical neurons that are actually found in that region (but may not have been part of the morphological sampling).

      We agree that the limited pool of morphological reconstructions can lead to artifactual results in the way the reviewer pointed out. To investigate that hypothesis, we added a supplementary figure (S14) where we characterize (1): to what degree the morphological composition of a columnar subvolume reflects the overall composition of the model; and (2): The level of morphological diversity in each columnar subvolume. We discuss the results at the end of section 2.6. Briefly, while we cannot fully rule out the possibility of an artificial result, we found a high and virtually uniform level of morphological diversity in all columns and layers. This makes it unlikely that individual idiosyncratic morphologies strongly affect the local connectivity. However, we acknowledge that the minimum level of morphological diversity required is unknown. We believe that at this stage all we can do is characterize this and leave final interpretation to the reader.

      I also wonder how much the assumption that the layers have the same relative thicknesses everywhere in the cortex affects these findings, since layer thicknesses do in fact vary across the cortex.

      We agree that layer thickness variation would affect circuit properties. Variability of layer thickness can be split into two components: variability stemming from differences in total thickness, which our model covers, and variability of relative, i.e., normalized layer thickness, which we miss. In this region of cortex, though, data on the relative thickness of cortical layers is sparse. The Waxholm Atlas does not distinguish somatosensory cortical layers in its labels [Kleven et al, 2023]. Yusufoğulları (2015) compares layer thicknesses of rat hindlimb and barrel field regions. After normalization against total thickness, the relative difference increased towards the superficial layers from 0 in L6 to 33% in L1. Variability of normalized thicknesses within developed rat barrel cortex, based on layer boundaries reported in Narayanan et al. (2017) vary by 2% to 5% over approximately 2 mm. One major effect of such variability would be to scale the number of neurons in a given layer locally by the corresponding factors. For comparison, the resulting variability in neuron counts due to differences in conicality (Fig. 7D1) was around +-25%. A further effect of variable relative layer thickness would be its impact on the selection of suitable morphologies to be placed in the volume.

      In summary, adjustment of layer thickness is a refinement which should be done in future versions of the model, once more data is available. The discussion section has been updated to acknowledge this limitation. However, as outlined at the beginning of this point-by-point reply, we will not conduct such updates to the model in the context of this manuscript, as it describes the version of the model used for a number of follow-up studies.

      In addition, the complexity of the model means that some complicated analyses and decisions are only presented in this manuscript with perhaps a single panel and not much textual explanation. I find, for example, that the panels of Figure S2 seem to abstract or simplify many details to the point where I am not clear about what they are actually illustrating - how does Figure S2D represent the results of "the process illustrated in B"? Why are there abrupt changes in connectivity at region borders (shown as discontinuous colors), when dendrites and axons span those borders and so would imply interconnectivity across the borders? What do the histograms in E1 and E2 portray, and how are they related to each other?

      We apologize for the confusion. We have updated the figure caption of Figure S2 to better explain its contents.

      Overall, the model presented in this study represents an enormous amount of work and stands as a unique resource for the community, but also is made somewhat unwieldy for the community to employ due to the weight of its manifold specific construction decisions, size, and complexity.

      Reviewer #2 (Public Review):

      Summary:

      The authors build a colossal anatomical model of juvenile rat non-barrel primary somatosensory cortex, including inputs from the thalamus. This enhances past models by incorporating information on the shape of the cortex and estimated densities of various types of excitatory and inhibitory neurons across layers. This is intended to enable an analysis of the micro- and mesoscopic organisation of cortical connectivity and to be a base anatomical model for large-scale simulations of physiology.

      Strengths:

      • The authors incorporate many diverse data sources on morphology and connectivity.

      • This paper takes on the challenging task of linking micro- and mesoscale connectivity.

      • By building in the shape of the cortex, the authors were able to link cortical geometry to connectivity. In particular, they make an unexpected prediction that cortical conicality affects the modularity of local connectivity, which should be testable.

      • The author's analysis of the model led to the interesting prediction that layer 5 neurons connect local modules, which may be testable in the future, and provide a basis to link from detailed anatomy to functional computations.

      • The visualisation of the anatomy in various forms is excellent.

      • A subnetwork of the model is openly shared (but see question below).

      We thank the reviewer for their kind comments.

      Weaknesses:

      • Why was non-barrel S1 of the juvenile rat cortex selected as the target for this huge modelling effort? This is not explained.

      We have added an explanation of this decision to the third paragraph of the introduction.

      • There is no effort to determine how specific or generalisable the findings here are to other parts of the cortex. Although there is a link to physiological modelling in another paper, there is no clear pathway to go from this type of model to understand how the specific function of the modelled areas may emerge here (and not in other cortical areas).

      With respect to generality against specific findings, our philosophy is as follows: Despite the fact that most of our source data comes from juvenile rat somatosensory cortex, we also had to generalize many data sources across organisms, ages or regions. Hence, in this iteration we focused on investigating the general features of the (multi-region) mammalian cortex, e.g., high-order motifs, connected by L5 neurons across subregions or the effect of curvature on the connectivity. In the future, more specific data sources can be used to build diverging versions of the model, e.g. one for adult vs. juvenile rat. They can then be used to contrast the ages and focus on more specific findings. We already defined a number of structural metrics that can be used to contrast more specific versions of the model quantitatively.

      We now clarify this pathway to understanding more specific function in the last paragraph of the discussion.

      • In a few places the manuscript could be improved by being more specific in the language, for example:

      - "our anatomy-based approach has been shown to be powerful", I would prefer instead to read about specific contributions of past papers to the field, and how this builds on them.

      - similarly: "ensuring that the total number of synapses in a region-to-region pathway matches biology." Biology here is a loose term and implies too much confidence in the matching to some ground truth. Please instead describe the source of the data, including the type of experiment.

      We have removed or rewritten the mentioned parts. We now clarify that we work based on biological estimates from experiments and cite the experiment sources. We also provide brief descriptions of the types of data and how they were derived.

      • Some of the decisions seem a little ad-hoc, and the means to assess those decisions are not always available to the reader e.g.

      - pg. 10. "Based on these results, we decided that the local connectome sufficed to model connectivity within a region.". What is the basis for this decision? Can it be formalised?

      - "In the remaining layers the results of the objective classification were used to validate the class assignments of individual pyramidal cells. We found the objective classification to match the expert classification closely (i.e., for 80-90% of the morphologies). Consequently, we considered the expert classification to be sufficiently accurate to build the model." The description of the validation is a little informal. How many experts were there? What are their initials? Was inter-rater or intra-rater reliability assessed? What are these numbers? The match with Kanari's classification accuracy should be reported exactly. There are clearly experts among the author list, but we are all fallible without good controls in place, and they should be more explicit about those controls here, in my opinion.

      - "Morphology selection was then performed as previously (Markram et al., 2015), that is, a morphology was selected randomly from the top 10% scorers for a given position." A lot of the decisions seem a little ad-hoc, without justification other than this group had previously done the same thing. For example, why 10% here? Shouldn't this be based on selecting from all of the reasonable morphologies?

      We have clarified that the density of local connectivity is verified against the validation datasets by comparing the diagonals in Figure 4B, in addition to the quantification of Figure 4C.

      For the classification, we have now published a detailed preprint describing the objective confirmation of expert classification by a variety of methods (see Kanari et al. 2024 https://www.biorxiv.org/content/10.1101/2024.09.13.612635v1). We cannot include the full methodology in the current paper, due to its large extent. For the benefit of the reader, we have included the appropriate citation and extended the short description of the methodology. As described in this paper, the classification accuracy varies per layer, cell type, etc. We have now described in more details these results, that can be accessed in details in out preprint.

      • I would like to know if one of the key results relating to modularity and cortical geometry can be further explored. In particular, there seem to be sharp changes in the data at the end of the modelled cortical regions, which need to be explored or explained further.

      We now explore these results further in supplementary figure S15, which we discuss in the results Section 2.6.

      • The shape of the juvenile cortex - a key novelty of this work - was based on merely a scalar reduction of the adult cortex. This is very surprising, and surely an oversimplification. Huge efforts have gone into modelling the complex nonlinear development of the cortex, by teams including the developing Human Connectome Project. For such a fundamental aspect of this work, why isn't it possible to reconstruct the shape of this relatively small part of the juvenile rat cortex?

      We agree that a more complex approach should be used in the future. However, as outlined at the beginning of this point-by-point reply, we will not conduct such updates to the model in the context of this manuscript, as it describes the version of the model used for a number of follow-up studies.

      • The same relative laminar depths are used for all subregions. This will have a large impact on the model. However, relative laminar depths can change drastically across the cortex (see e.g. many papers by Palomero-Gallagher, Zilles, and colleagues). The authors should incorporate the real laminar depths, or, failing that, show evidence to show that the laminar depth differences across the subregions included in the model are negligible.

      This point has also been raised by reviewer #1 above. For convenience, we repeat our reply below.

      We agree that layer thickness variation would affect circuit properties. Variability of layer thickness can be split into two components: variability stemming from differences in total thickness, which our model covers, and variability of relative, i.e., normalized layer thickness, which we miss. In this region of cortex, though, data on the relative thickness of cortical layers is sparse. The Waxholm Atlas does not distinguish somatosensory cortical layers in its labels [Kleven et al, 2023]. Yusufoğulları (2015) compares layer thicknesses of rat hindlimb and barrel field regions. After normalization against total thickness, the relative difference increased towards the superficial layers from 0 in L6 to 33% in L1. Variability of normalized thicknesses within developed rat barrel cortex, based on layer boundaries reported in Narayanan et al. (2017) vary by 2% to 5% over approximately 2 mm. One major effect of such variability would be to scale the number of neurons in a given layer locally by the corresponding factors. For comparison, the resulting variability in neuron counts due to differences in conicality (Fig. 7D1) was around +-25%. A further effect of variable relative layer thickness would be its impact on the selection of suitable morphologies to be placed in the volume.

      In summary, adjustment of layer thickness is a refinement which should be done in future versions of the model, once more data is available. The discussion section has been updated to acknowledge this limitation. However, as outlined at the beginning of this point-by-point reply, we will not conduct such updates to the model in the context of this manuscript, as it describes the version of the model used for a number of follow-up studies.

      • The authors perform an affine mapping between mouse and rat cortex. This is again surprising. In human imaging, affine mappings are insufficient to map between two individual brains of the same species and nonlinear transformations are instead used. That an affine transformation should be considered sufficient to map between two different species is then very surprising. For some models, this may be fine, but there is a supposed emphasis here on biological precision in terms of anatomical location.

      We agree that this is a weakness that we will address in future revisions of the model.

      • One of the most interesting conclusions, that the connectivity pattern observed is in part due to cooperative synapse formation, is based on analyses that are unfortunately not shown.

      We originally decided not to show this part as we underestimated the interest in this particular result. We have now included the result in supplementary figure S10 and discuss the figure in the results.

      • Open code:

      - Why is only a subvolume available to the community?

      We have now made the entire model available under doi.org/10.7910/DVN/HISHXN. The Data and Code availability section has been updated to clarify this.

      - Live nature of the model. This is such a colossal model, and effort, that I worry that it may be quite difficult to update in light of new data. For example, how much person and computer time would it take to update the model to account for different layer sizes across subregions? Or to more precisely account for the shape of the juvenile rat cortex?

      To provide more information to people interested in participating in model refinements, we have added a new Figure 9. We discuss potential opportunities for refinement at the end of the discussion section.

      Reviewer #3 (Public Review):

      This manuscript reports a detailed model of the rat non-barrel somatosensory cortex, consisting of 4.2 million morphologically and biophysically detailed neuron models, arranged in space and connected according to highly sophisticated rules informed by diverse experimental data. Due to its breadth and sophistication, the model will undoubtedly be of interest to the community, and the reporting of anatomical details of modeling in this paper is important for understanding all the assumptions and procedures involved in constructing the model. While a useful contribution to this field, the model and the manuscript could be improved by employing data more directly and comparing simple features of the model's connectivity - in particular, connection probabilities - with relevant experimental data.

      The manuscript is well-written overall but contains a substantial number of confusing or unclear statements, and some important information is not provided.

      Below, major concerns are listed, followed by more specific but still important issues.

      Major issues

      (1) Cortical connectivity.

      Section 2.3, "Local, mid-range and extrinsic connectivity modeled separately", and Figure 4: I am confused about what is done here and why. The authors have target data for connectivity (Figure 4B1). But then they use an apposition-based algorithm that results in connectivity that is quite different from the data (Figure 4B2, C). They then use a correction based on the data (Figure 4E) to arrive at a more realistic connectivity. Why not set the connectivity based on the data right away then? That would seem like a more straightforward approach.

      We have completely re-written our description and discussion of connectivity in the model. We now more explicitly motivate our connectivity modeling choices in the first paragraph of section 2.3 of the results and in the second paragraph of the discussion.

      The same comment applies to Section 2.4., "Specificity of axonal targeting": the distributions of synapses on different types of target cell compartments were not well captured by the original model based on axon-dendrite overlap and pruning, so the authors introduced further pruning to match data specificity. While details of this process and what worked and what didn't may be interesting to some, overall it is not surprising, as it has been well known that cell types exhibit connectivity that is much more specific than "Peters rule" or its simple variations. The question is, since one has the data, why not use the data in the first place to set up the connectivity, instead of using the convoluted process of employing axon-dendrite overlap followed by multiple corrections?

      We would like to point out that we are not employing “Peters rule”, we now make this explicit in the revision in the first paragraph of section 2.3 of the results. Furthermore, we would argue that the match to the Motta et al. data indicates that our approach is more than just a “simple variation”. Finally, we believe that there is important insight in: 1. The specific ways in which the algorithm had to be changed to match the Schneider-Mizell data, e.g. that the connectivity of SST positive neurons did not have to be adapted at all. 2. That the specificity of the other two types could still be matched by a selection of a subset of axonal appositions (i.e., of potential synapses).

      Most importantly, what is missing from the whole paper is the characterization of connection probabilities, at least for the local circuit within one area. Such connection probabilities can be obtained from the data that the authors already use here, such as the MICRONS dataset. Another good source of such data is Campagnola et al., Science, 2022. Both datasets are for mouse V1, but they provide a comprehensive characterization across all cortical layers, thus offering a good benchmark for comparison of the model with the data. It would be important for the authors to show how connection probabilities realized in their model for different cell types compared to these data.

      We now report connection probabilities in the reworked figure 4 and compare them to reported connection probabilities from many different sources and labs in supplementary figure S8. We prefer a comparison to a wide range of sources to relying on a single report.

      (2) Section 2.5, "Structure of thalamic inputs" and Figure 6.

      The text in section 2.5 should provide more details on what was done - namely, that the thalamic axons were generated based on the axon density profiles and then synapses were established based on their overall with cortical dendrites. Figure S10 where the target axon densities from data and the model axon densities are compared is not even mentioned here. Now, Figure S10 only shows that the axon densities were generated in a way that matches the data reasonably well. However, how can we know that it results in connectivity that agrees with data? Are there data sources that can be used for that purpose? For example, the authors show that in their model "the peaks of the mean number of thalamic inputs per neuron occur at lower depths than the peaks of the synaptic density". Is this prediction of the model consistent with any available data?

      Most importantly, the authors should show how the different cell types in their model are targeted by the thalamic inputs in each layer. Experimental studies have been done suggesting specificity in targeting of interneuron types by thalamic axons, such as PV cells being targeted strongly whereas SST and VIP cells being targeted less.

      We have updated the Results section to provide context for the thalamic axon placement, and referred the reader to the methods for more detail. A reference to Figure S10 has now been added to this section as well.

      As for validations of the structure of the thalamo-cortical inputs: We found that the existing literature on the topic, such as Cruikshank et al., 2007, 2010 and more recently Sermet et al., 2019, is predominately on the physiological strengths of the pathways. We acknowledge that the authors provide compelling arguments that their findings are likely partially due to differences in the anatomical innervation strengths. On the other hand, Sporns, 2013 cautioned against mixing up structural and functional connectivity. Overall, we believe that it is simply cleaner to perform this validation in the accompanying manuscript (“Part II: Physiology and Experimentation”), using the full physiological model. Note that we have actually performed that validation in the manuscript (see preprint under the following doi: 10.1101/2023.05.17.541168, Figure 3H1).

      Note that a higher physiological strength onto PV+ neurons is observed.

      (3) "We have therefore made not only the model but also most of our tool chain openly available to the public (Figure 1; step 7)."

      In fact it is not the whole model that is made publicly available, but only about 5% of it (211,000 out of 4,200,000 neurons). Also, why is "most" of the tool chain made openly available, and not the whole tool chain?

      We have now made the entire model available under doi.org/10.7910/DVN/HISHXN. This has also been added to the Key resource table.

      With regard to the tool chain, everything is on our public github (https://github.com/BlueBrain/) except for the algorithm for detecting axonal appositions. For that tool there are currently unresolved potential copyright issues with former collaboration partners. We are working to resolve them.

      Other issues

      "At each soma location, a reconstruction of the corresponding m-type was chosen based on the size and shape of its dendritic and axonal trees (Figure S6). Additionally, it was rotated to according to the orientation towards the cortical surface at that point."

      After this procedure, were cells additionally rotated around the white matter-pia axis? If yes, then how much and randomly or not? If not, then why not? Such rotations would seem important because otherwise additional order potentially not present in the real cortex is introduced in the model affecting connectivity and possibly also in vivo physiology (such as the dynamics of the extracellular electric field).

      They are indeed additionally randomly rotated. We have clarified this in the revision.

      The term "new in vivo reconstructions" for the 58 neurons used in this paper in addition to "in vitro reconstructions" is a misnomer. It is not straightforward to see where the procedure is described, but then one finds that the part of Methods that describes experimental manipulations is mostly about that (so, a clearer pointer to that part of Methods could be useful). However, the description in Methods makes it clear that it is only labeling that is done in vivo; the microscopy and reconstruction are done subsequently in vitro. I would recommend changing the terminology here, as it is confusing. Also, can the authors show reconstructions of these neurons in the supplementary figures? Is the reconstruction shown in Figure 4A representative?

      The term is used because the staining is done in vivo. To the best of our knowledge, the reconstruction process cannot be performed in vivo. However, to avoid any confusion we modified the text to clarify this distinction to in-vivo stained.

      With respect to the reconstruction in Figure 4: The intent of the panel is to demonstrate the concept of targeted long-range axons that our morphologies are missing, necessitating the use of a second algorithm for longer-range connectivity. As such, it is not one of the reconstructions we used, but one of Janelia MouseLight. While we mentioned MouseLight in the figure caption, we formulated it in a way that could be misunderstood to mean that we merely used the MouseLight browser to render one of our morphologies. We apologize for the confusion, and we have fixed the figure caption.

      In this revision we have added exemplars of representative morphology reconstructions (in slice stained and in vivo stained) in a new supplementary figure, as requested (Figure S5). It is referenced in the last paragraph of section 2.1.

      In the Discussion, "This was taken into account during the modeling of the anatomical composition, e.g. by using three-dimensional, layer-specific neuron density profiles that match biological measurements, and by ensuring the biologically correct orientation of model neurons with respect to the orientation towards the cortical surface. As local connectivity was derived from axo-dendritic appositions in the anatomical model, it was strongly affected by these aspects.

      However, this approach alone was insufficient at the large spatial scale of the model, as it was limited to connections at distances below 1000μm."

      As mentioned above, it is not clear that this approach was sufficient for local connectivity either. It would be great if the authors showed a systematic comparison of local connection probabilities between different cell types in their model with experimental data and commented here in the Discussion about how well the model agrees with the data.

      As mentioned in the reply to a previous comment, we now report connection probabilities.

      In the Discussion: "The combined connectome therefore captures important correlations at that level, such as slender-tufted layer 5 PCs sending strong non-local cortico-cortical connections, but thick-tufted layer 5 PCs not." (Also the corresponding findings in Results.)

      If I understand this statement correctly, it may not agree with biological data. See analysis from MICRONS dataset in Bodor et al., https://www.biorxiv.org/content/10.1101/2023.10.18.562531v1.

      Our statement was indeed misleading and formulated too strongly. While thick-tufted pyramidal cells do form long-range intra-cortical connections, the structural strength of these pathways is weaker than for slender-tufted PCs, which are associated with the IT (intra-telencephalic) projection type. We have made this clear in the revision.

      Table 2 is confusing. What do pluses and minuses mean? What does it mean that some entries have two pluses? This table is not mentioned anywhere else in the text. If pluses mean some meaningful predictions of the model, then their distribution in the table seems quite liberal and arbitrary. It is not clear to me that the model makes that many predictions, especially for type-specificity and plasticity. Also, why is the hippocampus mentioned in this table? I don't see anything about the hippocampus anywhere else in the paper.

      We have clarified the description of the table in its caption and removed references to hippocampus, which were left from an earlier draft of the paper.

      In the Discussion, "Thus, we made the tools to improve our model also openly available (see Data and Code availability section)."

      As mentioned before, the authors themselves write that they made "most of our tool chain openly available to the public", but not all of it.

      With regard to the tool chain, everything is on our public github (https://github.com/BlueBrain/) except for the algorithm for detecting axonal appositions. For that tool there are currently unresolved potential copyright issues with former collaboration partners. We are working to resolve them.

      Table S2 has multiple question marks. It is not clear whether the "predictions" listed in that table are truly well-thought-out and/or whether experimental confirmations are real.

      Some of the citations in that table were broken due to technical difficulties with the citation manager used. We apologize and have fixed this in the revision.

      Introduction: It would be quite appropriate to cite here Einevoll et al., Neuron, 2019 ("The Scientific Case for Brain Simulations").

      We now reference this important work.

      Recommendations for the authors:

      Reviewing Editor's note:

      Consultation with the reviewers highlighted three main issues: the integration of connection probability profiles, non-uniform cortical thickness, and the overall organization of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Apart from the points discussed in the public review, my main concern is that the manuscript itself is not as tightly constructed as it should be, to the detriment of the reader's ability to understand the model itself and the conclusions from the presented analyses.

      There are places where the text references seemingly incorrect figure panels or refers to panels that don't exist:

      - Section 2.2, first paragraph - refers to Figure 2D, E but those panels do not exist in Figure 2.

      - Section 2.2, second paragraph - refers to Figure 3D3 - perhaps it should be 3B3?

      - Section 2.8, first paragraph - has no figure references but seems like it should be referring to parts of Figure 8 (perhaps Figure 8B1 specifically?)

      - Is the reference to Figure S11A on page 16 supposed to be to S12A?

      In other places, figure labels and descriptions are not clear, and terminology is not always well-defined or explained.

      - Figure 8 and the associated section 2.8 are very difficult to draw conclusions from as presented - several of the terms used are opaque and not clearly defined in the text or legends. I could not easily infer how the normalization works for the "normalized node participation per layer", or what "position in simplex" means for "unique neurons in core", and what their "relative counts" are relative to.

      - Are "targets" in Figure S12A the same as "sinks"? If so, it would be better to use a single term consistently throughout.

      - Figure S12 - figures in part B do not have enough labels to interpret - what is the y-axis of the "rich-club analysis" graph? Also, the figures in part B bottom are labeled "long-range" rather than "mid-range" connections.

      In general, I found the use of both letters and numbers for figure panels (e.g. Figure 7E1) more confusing than helpful - it didn't seem like panels with the same letter were visually grouped consistently, and it sometimes made it more difficult to follow the flow of a figure. I would recommend using only letters in nearly every case here.

      We thank the reviewer for directing our attention to these issues. We have fixed them in the revision. However, we have decided to keep our original panel numbering scheme. Panels with the same letter are meant to be conceptually grouped as they address related or similar measures.

      Other minor points:

      - Section 2.4 - paragraph 2 - sentence 5 "inhbititory" -> "inhibitory".

      - Figure 5B figure legend - references Schneider-Mizell et al. 2023 but probably should be Motta et al. 2019?

      - Figure 5C - figure key "expcected" -> "expected".

      - The lower part of Figure 7C looks like it belongs to panel D2 instead of panel C due to relative spacing.

      We once again thank the reviewer, and we have fixed the listed issues.

      Reviewer #2 (Recommendations For The Authors):

      (1) Abstract:

      - Is it really 'integrating whole brain-scale data'? This seems a bit misleading.

      - "We delineated the limits of determining connectivity from anatomy" - here I think you mean determining connectivity from morphology, or dendrite/axon appositions. Electron microscopy is still anatomy and presumably would be much closer to function.

      We originally used the term “anatomy” as connectivity depends on the correct placement of neurons in addition to their morphology. However, as the reviewer points out, this term is misleading as it would encompass electron microscopy, which can go beyond what we do with the model. We have updated the text to read “morphology and placement”.

      (2) Introduction:

      "Investigating the multi-scale interactions that shape perception requires a model of multiple cortical subregions with inter-region connectivity, but it also requires the subcellular resolution provided by a morphologically detailed model." - This statement, as written, is not true in my opinion. You can argue for the value of morphologically-detailed neuron models to the study of perception, but they are not required for the investigation of perception.

      We have updated the text to be clearer: subcellular resolution is only required for certain aspects that are related to perception.

      (3) Results:

      - Pg. 9/10. There are three sentences in a row that are of the style: "ensuring that the total number of synapses in a region-to-region pathway matches biology." Biology here is a loose term and implies too much confidence in the matching to some ground truth. Please instead describe the source of the data, including the type of experiment here already. o Pg. 10. On the first read, I found it quite hard to follow what exactly was done in Figure 4.

      What are the target values adapted from Reimann et al., 2019, for example?

      - Pg. 10. "Based on these results, we decided that the local connectome sufficed to model connectivity within a region.". What is the basis for this decision? Can it be formalised? o Pg. 16, Figure 7 B-C. The apparent effect of geometry on modularity is potentially very interesting. However, are the sharp drop-offs in values for modularity (but also conicality and height) true, or are some artefacts due to columns at the edges of the sampled area?

      We have discussed these points above in the general comments and strengths and weaknesses.

      - Pg. 18. Simplicial cores define central subnetworks, tied together by mid-range connections. This work, in particular leading to the conclusion of the layer 5 highway hubs, stands out as being a successful attempt to simplify the highly detailed model to a degree that it generates useable new understanding.

      We thank the reviewer for the kind comment.

      (4) Figures:

      Figure 2: The caption doesn't seem to match the Figure (e.g. there are no brain regions depicted in A). o Figure 4f. This is a key panel, but is squished into a small corner of Figure 4, and therefore hard-to-read.

      We have fixed this in the revision.

      Reviewer #3 (Recommendations For The Authors):

      In Major comments, point (1) discusses the issue of connectivity known from data. For all the aspects of connectivity mentioned there, I would recommend the authors re-build their model using the connectivity data directly. It would be interesting to test whether a model constructed in such a way would have any difference in simulated neural activity relative to the model they have constructed.

      This is indeed a very interesting avenue of research. However, we believe that it is best conducted in separate manuscripts. First, in Pokorny et al., 2024 (https://doi.org/10.1101/2024.05.24.593860) we conduct this investigation, comparing the emerging activity in the model to the one for simpler connectivity models. Additionally, in Egas-Santander et al., 2024 (https://www.biorxiv.org/content/10.1101/2024.03.15.585196v3) we found that simpler connectomes lead to less reliable spiking activity globally. Finally, in the accompanying manuscript (https://www.biorxiv.org/content/10.1101/2023.05.17.541168v5) we compare activity with and without the targeting specificity of Schneider-Mizell et al.

      In Major comments, point (2) discusses thalamic inputs. I would recommend the authors to address the issues mentioned there.

      We have replied to those comments above.

      In addition, panels F and G of Figure 6 are mentioned in the caption but are not shown in the figure. In panel B, the choice of visualization is strange. It would make sense to show box plots for all the data instead of bars for mean values and points for randomly selected 50 cells. Panels E1 and E2 lack units.

      We have removed mentions of panels F and G and changed the style of plot. Units for E1 and E2 are now explained in the figure caption.

      In Major comments, point (3) touches upon model and tool sharing. I would recommend making such statements more accurate and reflecting what exactly is provided to the community since not everything is shared.

      We have now made the entire model available under doi.org/10.7910/DVN/HISHXN.

      With regard to the tool chain, everything is on our public github (https://github.com/BlueBrain/) except for the algorithm for detecting axonal appositions. For that tool there are currently unresolved potential copyright issues with former collaboration partners. We are working to resolve them.

      I would recommend the authors address all the other points mentioned in the public review as well. In addition, below are some smaller issues that should be fixed.

      Figure 2: the caption appears to be partially wrong and partially misassigned to the figure panels.

      We fixed the issue.

      Also, note that in L6 the types L6_TPC:A and L6_TPC:C are listed in the figure, but L6_TPC:B is not mentioned.

      There is indeed no TPC:B type in layer 6. The distinction between TPC:A and TPC:B is based on early or late bifurcations of the apical dendrite and is only observed in layer 5.

      Figure 3, panel B2: the caption refers to colors in panel (C), but the authors probably meant to refer to panel (A).

      We fixed the issue.

      "The placement of morphological reconstructions matched expectation, showing an appropriately layered structure with only small parts of neurites leaving the modeled volume (Figure 2D, E)."

      Figure 2 does not have panels D and E.

      "The volume was clearly dominated by dendrites, filling between 23% and 47% of the space, compared to 2% to 11% for axons (Figure 3D3)." There is no panel D or D3 in Figure 3.

      "Recently, the MICrONS dataset (MICrONS-Consortium et al., 2021) has been analyzed with respect to the axonal targeting of inhibitory subtypes in a 100 x 100 μm subvolume spanning all layers (Schneider-Mizell et al., 2023)."

      100 x 100 μm is an area (and should be 100 x 100 μm^2), not a volume.

      Figure S11B requires a legend for the color map.

      We fixed the issues.

      Table S1: What is the difference between L6_BP and L6_BPC? They both are referred to as L6 bipolar cells.

      We have changed the description of L6_BPC to “Layer 6 bitufted pyramidal cell”.

    2. eLife Assessment

      This manuscript reports a detailed model of juvenile rat somatosensory cortex, consisting of 4.2 million morphologically and biophysically detailed neuron models, arranged in space and connected according to diverse experimental data - a valuable tool for the field. The construction of the model is based on a methodology with solid supporting evidence. It should be noted that, by necessity, such a large-scale model development involves many assumptions, interpolations, and decisions that could have compounding downstream effects on further analyses that may be difficult to disambiguate.

    3. Reviewer #1 (Public review):

      Summary:

      In this study, the authors describe the construction of an extremely large-scale anatomical model of juvenile rat somatosensory cortex (excluding the barrel region), which extends earlier iterations of these models by expanding across multiple interconnected cortical areas. The models are constructed in a way to maintain biological detail from a granular scale - for example, individual cell morphologies are maintained, and synaptic connectivity is founded on anatomical contacts. The authors use this model to investigate a variety of properties, from cell-type specific targeting (where the model results are compared to findings from recent large-scale electron microscopy studies) to network metrics. The model is also intended to serve as a platform and resource for the community by being a foundation for simulations of neuronal circuit activity and for additional anatomical studies that rely on the detailed knowledge of cellular identity and connectivity.

      Strengths:

      As the authors point out, the combination of scale and granularity of their model are what make this study valuable and unique. The comparisons with recent electron microscopy findings are some of the most compelling results presented in the study, showing that certain connectivity patterns can arise directly from the anatomical configuration, while other discrepancies highlight where more selective targeting rules (perhaps based on molecular cues) are likely employed. They also describe intriguing effects of cortical thickness and curvature on circuit connectivity and characterize the magnitude of those effects on different cortical layers.

      The detailed construction of the model is drawn on wide range of data sources (cellular and synaptic density measures, neuronal morphologies, cellular composition measures, brain geometry, etc.) that are integrated together; other data sources are used for comparison and validation. This consolidation and comparison also represents a valuable contribution to the overall understanding of the modeled system.

      Weaknesses:

      The scale of the model, which is a primary strength, also can carry some drawbacks. In order to integrate all the diverse data sources together, many specific decisions must be made about, for example, translating findings from different species or regions to the modeled system, or deciding which aspects of the system can be assumed to be same and which should vary. All these decisions will have effects on the predicted results from the model, which could limit the types of conclusions that can be made (both by the others and by others in the community who may wish to use the model for their own work). However, the public release of the models and most of the associated tools does provide others a somewhat easier path to modify and evaluate this iteration of the model for their own studies.

      Overall, the model presented in this study represents an enormous amount of work and stands as the basis for other work by the same group as well as a unique resource for the community, even while acknowledging that it may be somewhat unwieldy for the community to employ due to the weight of its manifold specific construction decisions, size, and complexity.

    4. Reviewer #2 (Public review):

      Summary:

      The authors build a colossal anatomical model of juvenile rat non-barrel primary somatosensory cortex, including inputs from the thalamus. This enhances past models by incorporating information on the shape of the cortex and estimated densities of various types of excitatory and inhibitory neuron across layers. This is intended to enable analysis of the micro- and mesoscopic organisation of cortical connectivity and to be a base anatomical model for large-scale simulations of physiology.

      Strengths:

      • The authors incorporate many diverse data sources on morphology and connectivity.<br /> • This paper takes on the challenging task of linking micro- and meso-scale connectivity<br /> • By building in the shape of the cortex, the authors were able to link cortical geometry to connectivity. In particular they make an unexpected prediction that cortical conicality affects the modularity of local connectivity, which should be testable.<br /> • The author's analysis of the model led to the interesting prediction that layer 5 neurons' connect local modules, which may be testable in the future, and provide a basis to link from detailed anatomy to functional computations.<br /> • The visualisation of the anatomy in various forms is excellent<br /> • The model is openly shared

      Weaknesses:

      • There is no effort to determine how specific or generalisable the findings here are to other parts of cortex.<br /> • Although there is a link to physiological modelling in another paper, there is no clear pathway to go from this type of model to understanding how the specific function of the modelled areas may emerge here (and not in other cortical areas).<br /> • Some of the decisions seem a little ad-hoc, and the means to assess those decisions is not always easily available to the reader<br /> • The shape of the juvenile cortex - a key novelty of this work - was based on merely a scalar reduction of the adult cortex. This is very surprising, and surely an oversimplification. Huge efforts have gone into modelling the complex nonlinear development of cortex, by teams including the developing Human Connectome Project. For such a fundamental aspect of this work, why isn't it possible to reconstruct the shape of this relatively small part of juvenile rat cortex?<br /> • The same relative laminar depths are used for all subregions. This will have a large impact on the model. However, relative laminar depths can change drastically across the cortex (see e.g. many papers by Palomero-Gallagher, Zilles and colleagues). The authors should incorporate the real laminar depths, or, failing that, show evidence to show that the laminar depth differences across the subregions included in the model are negligible.<br /> • The authors perform an affine mapping between mouse and rat cortex. This is again surprising. In human imaging, affine mappings are insufficient to map between two individual brains of the same species, and nonlinear transformations are instead used. That an affine transformation should be considered sufficient to map between two different species is then very surprising. For some models, this may be fine, but there is a supposed emphasis here on biological precision in terms of anatomical location.<br /> o Live nature of the model. This is such a colossal model, and effort, that I worry that it may be quite difficult to update in light of new data. For example, how much person and compute time would it take to update the model to account for different layer sizes across subregions? Or to more precisely account for the shape of juvenile rat cortex?

    5. Reviewer #3 (Public review):

      This manuscript reports a detailed model of the rat non-barrel somatosensory cortex, consisting of 4.2 million morphologically and biophysically detailed neuron models, arranged in space and connected according to highly sophisticated rules informed by diverse experimental data. Due to its breadth and sophistication the model will undoubtedly be of interest to the community, and the reporting of anatomical details of modeling in this paper is important for understanding all the assumptions and procedures involved in constructing the model. While a useful contribution to this field, the model and the manuscript could be improved by employing data more directly and comparing simple features of the model's connectivity - in particular, connection probabilities - with relevant experimental data.

      The manuscript is overall well-written, but contains a substantial number of confusing or unclear statements, and some important information is not provided.

      Comments on revisions:

      The authors mostly addressed all my points and improved the paper substantially. I do not have further extensive comments except one general point below.

      Regarding section 2.3 and metrics of connectivity like pairwise connection probabilities, it is great that the authors rewrote that section and added comparisons with experimental data in Figs. 4 and S9. Unfortunately, what one finds when direct comparisons are made is that the modeled pairwise connectivity is quite different from the data. Fig. S9 shows that the model's results do not agree with data in about half of the cases (purple and red arrows). Similarly large discrepancies can be seen for some other metrics, like in Fig. S10B and S10C1,C2. (And similar concerns apply to thalamocortical connections in section 2.5, where it looks like little to no data are available to verify the pairwise connectivity between the thalamic and cortical neurons via a direct comparison.)

      This is concerning since this model forms the basis for multiple other studies of cortical dynamics and function by the same group and potentially others in the community, with multiple papers relying on it, whereas basic properties of connectivity are apparently not captured well.

      On the other hand, this is also a "glass half full" situation, showing that the sophisticated algorithms for establishing connections, developed by the authors, are working well in at least half of the connection types explored. It is therefore imperative that the authors continue refining these algorithms to capture the remaining half in future iterations and producing improved models that the community can better rely on.

      Please also note that Fig. S11 does not have a caption.

    1. eLife Assessment

      This work is potentially important and largely convincing given the state-of-the-art approaches used to unravel the mechanism underlying the release of Claudins via Rho-mediated activation of Matriptase during tight junction formation. However, there are a few concerns. Addressing the following two major concerns a) showing Matriptase is indeed activated and b) Matriptase inhibition does not interfere with keratinocyte specification, would significantly improve the strength of the evidence. In addition, including quantifications, missing methods, and improving the rigor of the analyses would be helpful.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Rho-ROCK liberates sequestered claudin for rapid de novo tight junction formation" by Cho and colleagues investigates de novo tight junction formation during the differentiation of immortalized human HaCaT keratinocytes to granular-like cells, as well as during epithelial remodeling that occurs upon the apoptotic of individual cells in confluent monolayers of the representative epithelial cell line EpH4. The authors demonstrate the involvement of Rho-ROCK with well-conducted experiments and convincing images. Moreover, they unravel the underlying molecular mechanism, with Rho-ROCK activity activating the transmembrane serine protease Matriptase, which in turn leads to the cleavage of EpCAM and TROP2, respectively, releasing Claudins from EpCAM/TROP2/Claudin complexes at the cell membrane to become available for polymerization and de novo tight junction formation. These functional studies in the two different cell culture systems are complemented by localization studies of the according proteins in the stratified mouse epidermis in vivo.

      In total, these are new and very intriguing and interesting findings that add important new insights into the molecular mechanisms of tight junction formation, identifying Matriptase as the "missing link" in the cascade of formerly described regulators. The involvement of TROP2/EpCAM/Claudin has been reported recently (Szabo et al., Biol. Open 2022; Bugge lab), and Matriptase had been formerly described to be required for in tight junction formation as well, again from the Bugge lab. Yet, the functional correlation/epistasis between them, and their relation to Rho signaling, had not been known thus far.

      However, experiments addressing the role of Matriptase require a little more work.

      Strengths:

      Convincing functional studies in two different cell culture systems, complemented by supporting protein localization studies in vivo. The manuscript is clearly written and most data are convincingly demonstrated, with beautiful images and movies.

      Weaknesses:

      The central finding that Rho signaling leads to increased Matriptase activity needs to be more rigorously demonstrated (e.g. western blot specifically detecting the activated version or distinguishing between the full-length/inactive and processed/active version).

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate how epithelia maintain intercellular barrier function despite and during cellular rearrangements upon e.g. apoptotic extrusion in simple epithelia or regenerative turnover in stratified epithelia like this epidermis. A fundamental question in epithelial biology. Previous literature has shown that Rho-mediated local regulation of actomyosin is essential not only for cellular rearrangement itself but also for directly controlling tight junction barrier function. The molecular mechanics however remained unclear. Here the authors use extensive fluorescent imaging of fixed and live cells together with genetic and drug-mediated interference to show that Rho activation is required and sufficient to form novo tight junctional strands at intercellular contacts in epidermal keratinocytes (HaCat) and mammary epithelial cells. After having confirmed previous literature they then show that Rho activation activates the transmembrane protease Matriptase which cleaves EpCAM and TROP2, two claudin-binding transmembrane proteins, to release claudins and enable claudin strand formation and therefore tight junction barrier function.

      Strengths:

      The presented mechanism is shown to be relevant for epithelial barriers being conserved in simple and stratifying epithelial cells and mainly differs due to tissue-specific expression of EpCAM and TROP2. The authors present careful state-of-the-art imaging and logical experiments that convincingly support the statements and conclusion. The manuscript is well-written and easy to follow.

      Weaknesses:

      Whereas the in vitro evidence of the presented mechanism is strongly supported by the data, the in vivo confirmation is mostly based on the predicted distribution of TROP2. Whereas the causality of Rho-mediated Matriptase activation has been nicely demonstrated it remains unclear how Rho activates Matriptase.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Rho-ROCK liberates sequestered claudin for rapid de novo tight junction formation" by Cho and colleagues investigates de novo tight junction formation during the differentiation of immortalized human HaCaT keratinocytes to granular-like cells, as well as during epithelial remodeling that occurs upon the apoptotic of individual cells in confluent monolayers of the representative epithelial cell line EpH4. The authors demonstrate the involvement of Rho-ROCK with well-conducted experiments and convincing images. Moreover, they unravel the underlying molecular mechanism, with Rho-ROCK activity activating the transmembrane serine protease Matriptase, which in turn leads to the cleavage of EpCAM and TROP2, respectively, releasing Claudins from EpCAM/TROP2/Claudin complexes at the cell membrane to become available for polymerization and de novo tight junction formation. These functional studies in the two different cell culture systems are complemented by localization studies of the according proteins in the stratified mouse epidermis in vivo.

      In total, these are new and very intriguing and interesting findings that add important new insights into the molecular mechanisms of tight junction formation, identifying Matriptase as the "missing link" in the cascade of formerly described regulators. The involvement of TROP2/EpCAM/Claudin has been reported recently (Szabo et al., Biol. Open 2022; Bugge lab), and Matriptase had been formerly described to be required for in tight junction formation as well, again from the Bugge lab. Yet, the functional correlation/epistasis between them, and their relation to Rho signaling, had not been known thus far.

      However, experiments addressing the role of Matriptase require a little more work.

      Strengths:

      Convincing functional studies in two different cell culture systems, complemented by supporting protein localization studies in vivo. The manuscript is clearly written and most data are convincingly demonstrated, with beautiful images and movies.

      Weaknesses:

      The central finding that Rho signaling leads to increased Matriptase activity needs to be more rigorously demonstrated (e.g. western blot specifically detecting the activated version or distinguishing between the full-length/inactive and processed/active version).

      We plan to provide more direct evidence that matriptase activation is regulated by the Rho-ROCK pathway, utilizing antibodies that specifically recognize the activated form of matriptase.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate how epithelia maintain intercellular barrier function despite and during cellular rearrangements upon e.g. apoptotic extrusion in simple epithelia or regenerative turnover in stratified epithelia like this epidermis. A fundamental question in epithelial biology. Previous literature has shown that Rho-mediated local regulation of actomyosin is essential not only for cellular rearrangement itself but also for directly controlling tight junction barrier function. The molecular mechanics however remained unclear. Here the authors use extensive fluorescent imaging of fixed and live cells together with genetic and drug-mediated interference to show that Rho activation is required and sufficient to form novo tight junctional strands at intercellular contacts in epidermal keratinocytes (HaCat) and mammary epithelial cells. After having confirmed previous literature they then show that Rho activation activates the transmembrane protease Matriptase which cleaves EpCAM and TROP2, two claudin-binding transmembrane proteins, to release claudins and enable claudin strand formation and therefore tight junction barrier function.

      Strengths:

      The presented mechanism is shown to be relevant for epithelial barriers being conserved in simple and stratifying epithelial cells and mainly differs due to tissue-specific expression of EpCAM and TROP2. The authors present careful state-of-the-art imaging and logical experiments that convincingly support the statements and conclusion. The manuscript is well-written and easy to follow.

      Weaknesses:

      Whereas the in vitro evidence of the presented mechanism is strongly supported by the data, the in vivo confirmation is mostly based on the predicted distribution of TROP2. Whereas the causality of Rho-mediated Matriptase activation has been nicely demonstrated it remains unclear how Rho activates Matriptase.

      As noted, while we have demonstrated that Rho activation is both necessary and sufficient to induce matriptase activation, the precise mechanism by which Rho mediates this activation remains unclear. As discussed in the manuscript, several potential molecular mechanisms could underlie the contribution of Rho to matriptase activation. As part of our future work, we intend to systematically investigate each of these mechanisms.

    1. eLife Assessment

      The ingenious design in this study achieved the observation of 3D cell spheroids from additional lateral view and gained more comprehensive information than the traditional one angle of imaging. This extended the methods to investigate cell behaviors in the growth or migration of tumor organoids in a time-lapse manner and these extensions should be valuable to the field. The authors provide solid evidence that the methods work as described.

    2. Reviewer #1 (Public review):

      Summary:

      The ingenious design in this study achieved the observation of 3D cell spheroids from an additional lateral view and gained more comprehensive information than the traditional one angle of imaging, which extensively extended the methods to investigate cell behaviors in the growth or migration of tumor organoids in the present study. I believe that this study opens an avenue and provides an opportunity to characterize the spheroid formation dynamics from different angles, in particular side-view with high resolution, in other organoids study in the future.

    3. Reviewer #2 (Public review):

      Summary:

      The author developed a new device to overcome current limitations in the imaging process of 3D spheroidal structures. In particular, they created a system to follow in real-time tumour spheroid formation, fusion and cell migration without disrupting their integrity. The system has also been exploited to test the effects of a therapeutic agent (chemotherapy) and immune cells.

      Strengths:

      The system allows the in situ observation of the 3D structures along the 3 axes (x,y and z) without disrupting the integrity of the spheroids; in a time-lapse manner it is possible to follow the formation of the 3D structure and the spheroids fusion from multiple angles, allowing a better understanding of the cell aggregation/growth and kinetic of the cells.

      Interestingly the system allows the analysis of cell migration/ escape from the 3D structure analysing not only the morphological changes in the periphery of the spheroids but also from the inner region demonstrating that the proliferating cells in the periphery of the structure are more involved in the migration and dissemination process. The application of the system in the study of the effects of doxorubicin and NK cells would give new insights in the description of the response of tumor 3D structure to killing agents.

    4. Author response:

      Reviewer #1:

      We sincerely thank you for your thoughtful review and constructive comments on our work and we appreciate your positive assessment of our study’s innovative design, which allows for improved observation of 3D cell spheroids from an additional lateral view. Your comments underscore the importance of our approach in advancing methods for investigating cell behaviors in tumor organoid studies.

      In response to your suggestions, we will first add a detailed image of the ‘First surface mirror’ in Fig. 1 to provide a reference for readers and other researchers, thereby facilitating broader use of this method in similar observations. Regarding the suitable sample sizes for this device, as the spheroid sizes are relatively small compared to the mirror and culture dish, we have been able to image samples up to 5 mm in height, which provides ample capacity for most spheroids under 1 mm. We will include additional experiments and explanations in the manuscript to clarify this further.

      Concerning the ring-shaped seeding pattern of spheroids, we have conducted extensive culture experiments to optimize this method. The agarose microwells-based method has proven to be highly tolerant of variations. Within these microwells, cells have a propensity to self-aggregate, leading to the formation of spheroid structures. We will add a discussion in the revised manuscript to address this issue.

      Lastly, this device can accommodate the fluorescence imaging of 3D spheroid samples. We will supplement the discussion with a schematic illustrating the principles of fluorescence imaging using this device, providing a foundation for future work in this area. We will also regarding language improvements to enhance the overall quality of the manuscript.

      Thank you once again for your valuable insights, which have greatly contributed to the strengthening of our manuscript.

      Reviewer #2:

      We sincerely thank you for your detailed and supportive review of our manuscript. Your recognition of our system’s capabilities for in situ observation of 3D structures along multiple axes, as well as its potential applications in studying therapeutic effects, is highly encouraging. Your comments on the advantages of this system for analyzing cell migration, morphological changes, and responses to therapeutic agents are especially appreciated.

      Thank you again for your thoughtful feedback and for highlighting the contributions of our work. Your insights have been invaluable in refining the focus and clarity of our study, and we hope that our revisions meet your expectations.

    1. eLife Assessment

      In this valuable study, the authors used an elegant genetic approach to delete EED at the post-neural crest induction stage. The usage of the single-cell RNA-seq analysis method is extremely suitable to determine changes in the cell type-specific gene expression during development. Results backed by solid evidence demonstrate that Eed is required for craniofacial osteoblast differentiation and mesenchymal proliferation after the induction of the neural crest.

    2. Reviewer #1 (Public review):

      Epigenetic regulation complex (PRC2) is essential for neural crest specification, and its misregulation has been shown to cause severe craniofacial defects. This study shows that Eed, a core PRC2 component, is critical for craniofacial osteoblast differentiation and mesenchymal proliferation after neural crest induction. Using mouse genetics and single-cell RNA sequencing, the researcher found that conditional knockout of Eed leads to significant craniofacial hypoplasia, impaired osteogenesis, and reduced proliferation of mesenchymal cells in post-migratory neural crest populations.

      Overall, the study is superficial and descriptive. No in-depth mechanism was analyzed and the phenotype analysis is not comprehensive.